Posts for tag: django

Configuring a Jenkins Slave

January 10 2012 by Colin Copeland

We're pretty avid testers here at Caktus and when one of our Django projects required upgrading to Python 2.7, we also needed to upgrade our Jenkins build environment. Luckily, Jenkins supports distributed builds to allow a master install to delegate tasks to slaves instances. This way we can continue to run our primary build system on Ubuntu 10.04, which defaults to Python 2.6, and delegate tasks to an Ubuntu 11.04 environment running Python 2.7. The setup is fairly easy, but since I didn't find much out there already, I figured I write up a quick post outlining what we did.

To start, we'll need a new machine. I setup an Ubuntu 11.04 instance on Linode. Then SSH in, upgrade the packages, and install a Java Runtime Environment:

$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install default-jre

That's the only package Jenkins needs by default. Next we'll setup a user for Jenkins to SSH as. To do this, we'll add a new user to the system and copy the master's SSH public key:

$ sudo useradd -m jenkins
$ sudo -u jenkins mkdir /home/jenkins/.ssh
$ sudo -u jenkins vim /home/jenkins/.ssh/authorized_keys2

Now the master Jenkins client can ssh to the slave without a password. Next we need to configure the Jenkins master to connect to the slave. Head over to the Master environment and navigate to "Manage Jenkins" and then "Manage Nodes". Click "New Node" in the sidebar and add a Dumb Slave. On the following page, fill in the following fields:

  • # of executors: 2 (controls the number of concurrent builds)
  • Remote FS root: /home/jenkins
  • Labels: python27 natty
  • Usage: Leave this machine for tied jobs only
  • Launch method: Launch slave agents on Unix machines via SSH. Also fill in the Host field with the address of your slave machine.

Hit save and your Jenkins master should open a connection to your slave machine. To use the new slave machine, update an existing Jenkins job and set the "Restrict where this project can be run" Label Expression to "python27". You'll need to install any project dependencies on the slave for it to build properly, but that's basically it!

Class-based views in Django 1.3

December 29 2011 by Dan Poirier

Django class-based views


Introduction

Django 1.3 added class-based views, but neglected to provide documentation to explain what they were or how to use them. So here's a basic introduction.


Example of a very basic class-based view

Let's start with an example of a very basic class-based view.

urls.py:

...
url(r'^/$', MyViewClass.as_view(), name='myview'),
...

views.py:

from django.views.generic.base import TemplateView

class MyViewClass(TemplateView):
    template_name = "index.html"

    def get(self, request, *args, **kwargs):
        context = # compute what you want to pass to the template
        return self.render_to_response(context)

This will render your template index.html with the context you computed and return it as the content of an HttpResponse.


Introduction to class-based views

Now that we've seen the obligatory example, how about some instructions?

  • To create a class-based view, start by creating a class that inherits from django.views.generic.View or one of its subclasses.

  • In your URLconf, specify the view method as the name of the new class, plus .as_view():

    url(r'urlpattern', MyViewClass.as_view(), ...)

  • In your class, write a get method that takes as arguments self (as always), request (the HttpRequest), and any other arguments from the request as specified in your URLconf.

  • In your get method, use the same logic you'd have used in an old view, except that you can assume the request method is GET. Return an HttpResponse as usual.

  • If you need to handle POST, write a post method, just like your get method except that you can assume the request method is POST.

  • Any request method that you don't write a handler method for will automatically get back a "method not allowed" response; you don't have to do anything special.

Example:

from django.views.generic import View
from django.shortcuts import render

class MyViewClass(View):
    def get(self, request, arg1, keyword=value):
        return do_something()
    def post(self, request, arg1, keyword=value):
        return do_something_else()

Handy subclasses of View

Django comes with a number of useful subclasses of View that provide some of the function that often ends up as boilerplate in views, just by inheriting from them. You saw TemplateView being used already. You'll probably want to base your views on TemplateView almost anytime you're generating the content for a response.

Another useful one is RedirectView. This can be used to redirect all requests. Example:

from django.core.urlresolvers import reverse
from django.views.generic import RedirectView

class MyRedirectView(RedirectView):
    url = reverse(...)

That is a complete view, and will return a redirect to url on any GET, POST, or HEAD request.

You can optionally set permanent = False to return a temporary redirect instead of the default permanent redirect, and query_string = True to include any query string from the incoming request on the redirect URL:

from django.core.urlresolvers import reverse
from django.views.generic import RedirectView

class MyRedirectView(RedirectView):
    url = reverse(...)
    permanent = False
    query_string = True

Decorators

Unfortunately, using decorators with class-based views isn't quite as simple as using them with the old method-based views.

Maybe you're used to doing this:

from django.contrib.auth.decorators import login_required

@login_required
def myview(request):
    context = ...
    return render(request, 'index.html', context)

With class-based views, you have to decorate the .dispatch() method of the class view, which means you have to override it just to decorate it. And you need to decorate the decorator, because the decorators provided by Django expect to be decorating method-based views, not class-based ones:

from django.contrib.auth.decorators import login_required
from django.views.generic.base import View
from django.views.utils.decorators import method_decorator

class MyViewClass(View):

    def get(self, request, **kwargs):
        context = ...
        return render(request, 'index.html', context)

    @method_decorator(login_required)
    def dispatch(self, *args, **kwargs):
        return super(MyViewClass, self).dispatch(*args, **kwargs)

This is an area of class-based views that could use some improvement.

You could apply the decorator in urls.py without needing so much extra code:

urls.py:

from django.contrib.auth.decorators import login_required
...
    url(r'^/$', login_required(MyViewClass.as_view()), name='myview'),
...

but that moves the policy from the view code to the URLconf, which is not where people will be expecting to have to look for it, so I wouldn't recommend it.


Passing arguments to the view

The method signature for get(), post(), etc. in a view class is:

def get(self, request, *args, **kwargs)

Any unnamed values captured in the URLconf regular expression are passed in args, and any named values are passed in kwargs, just like before.

You can pass extra arguments to your view using the third element of your URLconf, the same as before, or using a new technique -- passing them to the .as_view() call in your url settings. E.g.

...
    url(r'^/$', MyViewClass.as_view(extra_arg=3), name='myview'),
...

One warning - don't accidently write MyViewClass(extra_arg=3).as_view(). That'll still appear to work, but that extra_arg is just thrown away.


Where's the beef?

So far, all we've done is the same behavior, written using a different syntax. But class-based views enable a whole new level of function.

Suppose you've got a view that displays some data on a web page, and you write it as a class-based view. Maybe something like this:

from django.views.generic.base import TemplateView

class MyViewClass(TemplateView):
    template_name = 'index.html'

    def get(self, request, **kwargs):
        # Lots of complex logic in here to compute 'context'
        self.render_to_response(context)

Now you're asked to provide an HTTP API that returns the same data in json.

Start by refactoring your existing class slightly, moving your business logic out of the get() method:

from django.views.generic.base import TemplateView

class MyViewClass(TemplateView):
    template_name = 'index.html'

    def compute_context(self, request, **kwargs):
        # Lots of complex logic in here to compute 'context'
        return context

    def get(self, request, **kwargs):
        self.render_to_response(self.compute_context(request, kwargs))

Now, write a new class that subclasses your original class, uses the same method to compute the data, but overrides get() with different rendering code:

class MyJsonViewClass(MyViewClass):
    def get(self, request, **kwargs):
        data = self.compute_context(request, **kwargs)
        # Very naive way to put your data into json, but a good starting place
        content = json.dumps(data)
        return HttpResponse(content, content_type='application/json')

Add a new URL to urls.py pointing to your new class-based view, and you're done. All the logic you worked out earlier is still in use, and the power of subclassing let you provide the data in a new format almost effortlessly.


Class-based views for common policy

The previous example was still something you could have done almost as easily with method-based views, by refactoring your code into separate methods and calling them from all your views.

A more powerful use of the new class-based views is to provide common function for many views. If you have a site with many views, and they all inherit from a common view, then you have the potential to change behavior across the site by changing that one view.

Previously, you would probably have used middleware for this kind of thing. The problem with middleware is that it's completely hidden from the view code. When working on your view, you won't even know middleware is affecting things unless you go look at the settings and track down each piece of middleware configured there.

Furthermore, middleware affects every request, not just the views you really wanted it for.

With a common class-based view, every view affected is declared to inherit from that view, making it obvious that we're inheriting behavior from elsewhere. With a good IDE, you can even jump straight to that superclass to inspect it. Any view that doesn't need the common behavior doesn't have to inherit it.


References

The only documentation page that really discussed class-based views in Django 1.3 is this one:

https://docs.djangoproject.com/en/1.3/topics/class-based-views/

Some of the rationale for the current design of class-based views, and pros and cons of some alternatives that were considered, are documented here:

https://code.djangoproject.com/wiki/ClassBasedViews

Beyond that, the best advice I can give is to go read the code. The code for the base View is surprisingly small, and can be found at django/views/generic/base.py.

OpenBlock Geocoder, Part 3: External Geocoders

December 28 2011 by Colin Copeland

The OpenBlock geocoder is powerful and robust. It uses PostGIS for spacial queries, can extract addresses from bodies of text, and can understand block and intersection notation. We've run into a few issues with it, however, including a low geocoding success rate. This is a tough problem to solve and depends on a lot of factors (the extent of street and block data in OpenBlock, format of the street addresses, etc.), so your mileage may vary. Below I constructed a simple test using Google's Geocoding API to have as an alternative.

Disclamer: This is the third post in our OpenRural series reviewing OpenBlock and it's geocoder. You may wish to read Part 1: Data Model and Geocoding and Part 2: Text Parsing and Entity Extraction before proceeding.

Adding news with OpenBlock's geocoder

The Schema and NewsItem models provide OpenBlock with a generic data model to associate news with geographic locations. You can find a fairly extensive introduction in the official documentation, so we won't go into too much detail here.

Since a NewsItem requires a geographic point, let's use the OpenBlock geocoder to fine 123 East Franklin Street:

>>> from ebpub.geocoder import SmartGeocoder
>>> geocoder = SmartGeocoder()
>>> location_name = '123 East Franklin Street'
>>> point = geocoder.geocode(location_name)['point']
>>> point.wkt
'POINT (-79.0553588124999891 35.9133110937499964)'
You'll notice that point has a wkt attribute. wkt, or Well-known text, is a text markup language for representing geometry objects. Here we have a POINT, but the language can represent many geometries, including LineString and Polygons.

We'll use the "Local News" schema in this example as it is pre-loaded in OpenBlock:

>>> from ebpub.db import models as ebpub
>>> schema = ebpub.Schema.objects.get(name='Local News')

Using this schema, we'll add a new NewsItem with the point created above:

>>> import datetime
>>> news = schema.newsitem_set.create(
...     title='Incident downtown',
...     description='Something happend downtown today!',
...     item_date=datetime.date.today(),
...     location=point,
...     location_name=location_name,
... )
>>> news.location.wkt
'POINT (-79.0553588124999891 35.9133110937499964)'

That was easy. Now we have a NewsItem that OpenBlock is aware of and can be plotted on a map. However, what do we do if we can't geocode the address?

Using an External Geocoder

If we already have a geographic point, then we can circumvent the geocoder entirely:

>>> from django.contrib.gis.geos import Point
>>> manual_point = Point(-79.0553588124999891, 35.9133110937499964)
>>> news = schema.newsitem_set.create(
...     title='Incident downtown',
...     description='Something happend downtown today!',
...     item_date=datetime.date.today(),
...     location=manual_point,
...     location_name=location_name,
... )
>>> news.location.wkt
'POINT (-79.0553588124999891 35.9133110937499964)'

This means we can also use an external geocoder. For example, we can use Google's Geocoding API with geopy. First, you'll need a Google Maps API key, which we'll use with geopy:

>>> GOOGLE_MAPS_API_KEY = '' # your Google Maps API key

Then we can use geopy to construct a new geocoder:

>>> from geopy import geocoders
>>> g = geocoders.Google(GOOGLE_MAPS_API_KEY)

And we can geocode our address:

>>> address = '123 East Franklin Street, Chapel Hill, NC'
>>> place, (lat, lng) = g.geocode(address)
>>> point = Point(lng, lat)
>>> point.wkt
'POINT (-79.0549350000000004 35.9136495999999994)'

You can even tap into OpenBlock's internals and build a Geocoder that OpenBlock can use:

from django.conf import settings
from django.contrib.gis.geos import Point

from geopy import geocoders
from geopy.geocoders.google import GQueryError

from ebpub.geocoder import Geocoder, DoesNotExist


class GoogleGeocoder(Geocoder):

    def __init__(self, *args, **kwargs):
        kwargs['use_cache'] = False # haven't implemented cache yet
        super(GoogleGeocoder, self).__init__(*args, **kwargs)
        self.geocoder = geocoders.Google(settings.GOOGLE_MAPS_API_KEY)

    def _do_geocode(self, location_string):
        try:
            place, (lat, lng) = self.geocoder.geocode(location_string)
        except (GQueryError, ValueError), e:
            raise DoesNotExist(unicode(e))
        location = {'point': Point(lng, lat)}
        return location

This is an proof-of-concept geocoder we're using with OpenRural. You can find it on GitHub. Using this geocoder with a sample dataset from the North Carolina Secretary of State Corporation Filings, I was able to increase the geocoding success rate from about 37% to 95%. Again, your mileage will vary, but it can be useful to test out. We can't use Google's API for everything though. Normal users are limited to 2,500 requests per day. Business accounts are allotted 100,000 requests. Additionally, Google requires you to display any points geocoded with their API on a Google Map. So you'll need to evaluate your needs before deciding on using Google's API.

Using Django and Celery with Amazon SQS

December 19 2011 by Tobias McNulty

Amazon's Simple Queue Service (SQS) is a relatively new offering in the family of Amazon Web Services (AWS). It's also an appealing one, because it proposes to quickly and easily replace a common component of the stack in a typical web application, thereby obviating the need to run a separate queue server like RabbitMQ. While RabbitMQ — the typical favorite for Celery users — is not necessarily difficult to install or maintain, removing it from the stack of a web application means one less component that might fail, offloading that service to AWS — especially for applications with a small to moderate queue volume — might prove financially advantageous.

While it's quite easy to use Celery with Amazon's Simple Queue Service (SQS), there's currently not a lot of information out there about how to do it. There's this post on the celery-users list that didn't leave me with much hope, and this question on StackOverflow that sounded slightly more promising. I still couldn't find a step-by-step how to, however, and it ended up being quite easy, so here's my take:

  1. Upgrade to the latest versions of kombu, celery, and django-celery. At the time of this writing, those versions are 1.5.1, 2.4.5, and 2.4.2.:

    pip install kombu==1.5.1
    pip install celery==2.4.5
    pip install django-celery==2.4.2
    
  2. Add the following lines to settings.py (or local_settings.py depending on your setup):

    BROKER_TRANSPORT = 'sqs'
    BROKER_TRANSPORT_OPTIONS = {
        'region': 'us-east-1',
    }
    BROKER_USER = AWS_ACCESS_KEY_ID
    BROKER_PASSWORD = AWS_SECRET_ACCESS_KEY
    

    In the above, AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY should point to the appropriate AWS access key and secret for account you want to use. Pro tip: Use AWS's Identity and Access Management (IAM) to setup an API key and secret that only has access to the services your web application will use (typically one or more of SQS, SES, and SimpleDB).

  3. Finally, if you'll be running multiple servers or environments on the same AWS account (e.g., two different web apps or staging and production environments of the same app), you may want to customize the SQS queue name being used (the default is "celery"). To make this change, add the following lines to your settings.py (or again, local_settings.py):

    CELERY_DEFAULT_QUEUE = 'celery-myapp-production'
    CELERY_QUEUES = {
        CELERY_DEFAULT_QUEUE: {
            'exchange': CELERY_DEFAULT_QUEUE,
            'binding_key': CELERY_DEFAULT_QUEUE,
        }
    }
    

For the curious, Celery's support for SQS lies in the underlying Kombu library, the latest version of which includes a transport for SQS. While some points I found (including the StackOverflow post) suggest using the BROKER_URL syntax for pointing to AWS, I found it simpler to use the BROKER_USER and BROKER_PASSWORD variables. I also saw some reports that slashes in your API secret could confuse the underlying URL parser, and since my API secret happened to include a number of slashes, I went straight to using BROKER_USER and BROKER_PASSWORD.

Anyways, I hope this helps someone else looking to solve the same problem, and don't hesitate to comment if you run into any issues or have a better way to go about this!

OpenBlock Geocoder, Part 2: Text Parsing and Entity Extraction

December 16 2011 by Colin Copeland

This is the second post in our OpenRural series reviewing OpenBlock and it's geocoder. OpenBlock Geocoder, Part 1: Data Model and Geocoding covers the internals of the OpenBlock geocoder and it's geocoding capabilities. As this posts builds upon topics covered there, you may wish to read Part 1 before proceeding. In this post we step back from the internals of the geocoder and explore how to use it along with other OpenBlock tools to parse unstructured text.

I'd also like to give a shout out here to Paul Winkler who was kind enough to answer questions and point me in the right direction on the topics below. Thanks Paul!

The Problem

OpenBlock's original design is centered around providing news at a hyper-local level. That is, down to your own city block. This allows interested citizens to see events ranging from police incidents, to restaurant inspections, to local news articles all aggregated on a map of your block. OpenBlock provides scraping tools to assist downloading this data from the web, but the obvious problem here is that most data isn't packaged or tagged with geographic information. Let's look at an example article teaser from The Daily Tar Heel in Chapel Hill, NC:

No. 4 North Carolina led Evansville 63-27 with just more than 14 minutes to go in the first half when senior forward Tyler Zeller scored his 999th career point at the Smith Center on Tuesday night.

The article mentions the game at the Smith Center, which is the location we want to extract and plot on a map. This is where OpenBlock utilities to ingest unstructured text helps.

Places

Places are simple models containing only a name and geographic point. OpenBlock implements a mechanism to find places defined in the database from a body of text. For example, say we have the following string we'd like to parse:

>>> message = 'A good movie is playing at the Varsity Theater in Chapel Hill tonight.'

OpenBlock can extract "Varsity Theater" if we define it as a Place. You can create and import places in the OpenBlock admin, but to keep things simple, we'll just create one here:

Here we created a new Point of Interest place (which is loaded by default on any OpenBlock install) geocoded to 123 East Franklin Street. Now we need a way to parse places from strings. Most of this functionality is found in ebdata. And ebdata contains a Natural Language Processing package, nlp. We can use it's place_grabber to extract matching places:

We can feed this right back into the Place model to retrieve the database objects and their geographic locations:

The parser is case sensitive however, so it'll fail if it's not an exact match:

>>> grabber("VARSITY THEATER")
[]

Obviously this is a brute-force method and requires you to pre-load all places of interest into the database beforehand. It's pretty rudimentary, but does provide this functionality out-of-the-box.

Locations

OpenBlock can also extract locations defined in the database. We already have cities loaded, so we'll use them in this example. Just like the place grabber, the location grabber is case sensitive, so we'll define a location synonym with the proper case:

>>> from ebpub.db.models import Location, LocationSynonym
>>> ch = Location.objects.get(name='CHAPEL HILL')
>>> LocationSynonym(pretty_name='Chapel Hill', location=ch).save()

By default, the location grabber igonores types of "city" and "borough". To keep things simple, we'll just create one that includes all location types:

>>> grabber = places.location_grabber(ignore_location_types=[])

Now we can use the grabber to extract locations:

>>> grabber(message)
[(50, 61, 'Chapel Hill')]

If you plan to parse a lot of text in succession, the OpenBlock grabbers cache the locations/places on instantiation. So you won't hit the database after the initial run. Cool!

Addresses

ebdata.nlp can also parse addresses. For example, let's use a simple string:

>>> from ebdata.nlp.addresses import parse_addresses
>>> parse_addresses('The Varsity Theater is located at 123 N Franklin St')
[('123 N Franklin St', '')]

Under the hood, OpenBlock uses a large regular expression to do this, so it's not actually hitting the database or attemping to do geocoding. You'll notice that it returns a 2-item tuple. The second item is for the city:

>>> parse_addresses('The individual was seen on 123 N Franklin St in Chapel Hill')
>>> [('123 N Franklin St', 'Chapel Hill')]

It can parse block locations too:

>>> parse_addresses('The construction is on the 100 block of Franklin St.')
[('100 block of Franklin St.', '')]

And intersections:

>>> parse_addresses('The incident occured at the intersection of Franklin and Hillsborough')
[('Franklin and Hillsborough', '')]

It all comes together with the geocoder:

Conclusion

As you can see, OpenBlock provides a few useful utilities to parse unstructured text. They're fairly limited and, especially with the address parser, will most likely return a lot of false positives. But I think OpenBlock has provided a great starting point. Stayed tuned for more posts on inner-workings of the OpenBlock project!

OpenBlock Geocoder, Part 1: Data Model and Geocoding

December 12 2011 by Colin Copeland

As Tobias mentioned in Scraping Data and Web Standards, Caktus is collaborating with the UNC School of Journalism to help develop Open Rural (the code is on GitHub). Open Rural hopes to help rural newspapers in North Carolina leverage OpenBlock. This blog post is the first of several covering the internals of OpenBlock and, specifically, the geocoder.

OpenBlock Data Model

The OpenBlock geocoder can only geocode from the data is has. It doesn't leverage a 3rd-party API or service. It only uses what's loaded in PostgreSQL (with PostGIS and GeoDjango) and, in this example, what comes from the US Census Bureau and local city and county GIS offices.

Further, the imported data is typically filtered by a bounding box setting in METRO_LIST. The setting, extent, is a list of leftmost longitude, lower latitude, rightmost longitude, upper latitude. This defines a bounding box - the range of latitudes and longitudes that are relevant to your area. A small or restrictive box will limit imported ZIP code and block data to areas that fall within the box.

Let's look at an example with these shapefiles:

We'll start with a restrictive extent that only consists of downtown Chapel Hill:

METRO_LIST = (
    {
        # Extent of the region, as a longitude/latitude bounding box.
        'extent': (-79.066272, 35.91671, -79.040481, 35.910663),
        # ...
    },
)

This selection loaded 2 ZIP codes:

$ django-admin.py import_nc_zips
Importing zip codes...
# ...
Skipping 27511, out of bounds
Skipping 27513, out of bounds
Created ZIP Code 27514 
Created ZIP Code 27516 
Skipping 27517, out of bounds
Skipping 27519, out of bounds
# ...
Created 2 zipcodes.

And limited the block data as well:

$ django-admin.py import_county_streets 37135
Importing blocks, this may take several minutes ...
Created 73 blocks
Populating streets and fixing addresses, these can take several minutes...
Populating the streets table
streets: created: 28
block_intersections: created: 160
Done.

Restricting the area will limit the ability of the geocoder. In this case, for example, it can geocode the intersection of Franklin and Henderson, which is right downtown, but not Franklin and Estes (don't worry, we'll get into more geocoding details in the next section). A map helps illustrate this more clearly. Below you can see the bounding box with pins on the two intersections:



View OpenRural - Downtown Chapel Hill in a larger map

If we increase the bounding box, we'll get a lot more data:

METRO_LIST = (
    {
        # Extent of the region, as a longitude/latitude bounding box.
        'extent': (-79.165922, 35.829095, -78.978468, 36.02426),
        # ...
    },
)

With an extent that encompasses all of Chapel Hill, the importer loaded 9 ZIP codes, 4302 blocks, 1699 streets, and 7189 intersections. Here's a map illustrating the larger extent:



View OpenRural - Orange County, NC in a larger map

It's up to the maintainer of an OpenBlock install to determine which extent to use as it is based on the specifics of the application. A large extent will import more ZIP codes and blocks and, therefore, will slow down geospatial queries and may include unwanted geographic areas.

Street

Now that we have NC Orange County data loaded, let's investigate this data with the OpenBlock models.

The Street model contains a catalog of all loaded streets. It's a simple model with only a few fields:

  • street
  • pretty_name
  • street_slug
  • suffix
  • city
  • state

In NC Orange County, we can see that the street data spans 4 cities:

>>> from ebpub.streets.models import Street
>>> Street.objects.order_by('city').values_list('city', flat=True).distinct()
[u'', u'CARRBORO', u'CHAPEL HILL', u'DURHAM', u'HILLSBOROUGH']

Some streets cross city lines and therefore contain two entries:

>>> Street.objects.filter(street_slug='rosemary-st').values_list('city', flat=True)
[u'CARRBORO', u'CHAPEL HILL']

And, for example, if we're looking for Franklin St. in Chapel Hill, NC, we can filter for it here:

Blocks

Blocks are fundamental to OpenBlock and are used by the geocoder. OpenBlock defines a block as "a segment of a single street between one side street and another side street." The Block model is slightly more intricate than Street, but each entry basically represents the address range of a street for each block segment.

To start, we can see that Franklin St. is divided into roughly 32 blocks:

>>> from ebpub.streets.models import Block
>>> Block.objects.filter(street_slug='franklin-st').count()
32

It's sectioned into an east and west segment:

>>> Block.objects.filter(street_slug='franklin-st').order_by('street_pretty_name').values_list('street_pretty_name', 'predir').distinct()
[(u'Franklin St.', u'W'), (u'Franklin St.', u'E')]

And can have an address between 100 and 1899:

>>> Block.objects.filter(street_slug='franklin-st').aggregate(Min('from_num'), Max('to_num'))
{'from_num__min': 100, 'to_num__max': 1899}

So we can find the block that contains the 123 address:

Also, on a side note, it's possible for some blocks to span cities:

Geocoding

Now that we have a basic understanding of how the data is stored within OpenBlock, let's do some geocoding. Most of these examples will use the SmartGeocoder class. SmartGeocoder delegates to specific geocoders (AddressGeocoder, BlockGeocoder, and IntersectionGeocoder) based on how it interprets the string with regular expressions.

Addresses

To start, let's geocode "123 East Franklin Street":

This one was pretty easy for geocoder to parse and find. You can see that not only has it found the associated block, but it also knows the exact geographic point. However, this will fail if passed a non-existent address number (InvalidBlockButValidStreet):

In this case, the geocoder was able to extract the address, but it failed to find the associated block in the database. Non-existent streets also fail (DoesNotExist):

Intersections

The geocoder can locate intersections too:

Notice how the intersection field is populated, rather than block. This will raise a DoesNotExist exception when an intersection is not found:

Street Misspellings

OpenBlock provides a model, StreetMisspelling, to define street aliases. This allows you to map a bad street name to a good street name that exists in the database:

Now geocoding "Glen Haven" will find "Glenhaven".

Multiple Cities

By default, OpenBlock is configured to work with a single city, which is defined in METRO_LIST:

# Metros. You almost certainly only want one dictionary in this list.
# See the configuration docs for more info.
METRO_LIST = (
    {
        # Extent of the region, as a longitude/latitude bounding box.
        'extent': (-79.165922, 35.829095, -78.978468, 36.02426),

        # The major city in the region.
        'city_name': 'Chapel Hill', 
    },
)

The geocoder will fail if it locates a street that's associated with a city unknown to OpenBlock. For example, 100 Pine Street is in Carrboro and not Chapel Hill:

This street exists in the database due to our extent covering most of Orange County. Since we've setup OpenBlock to encompass an entire county, rather than a single city, we need to define additional cities. This can be accomplished one of two ways:

  • Add additional dictionaries to METRO_LIST for each city
  • Import city locations into the database and tell OpenBlock to refer to these

We imported Orange County city boundary data above, so we'll use the latter:

METRO_LIST = (
    {
        # Extent of the region, as a longitude/latitude bounding box.
        'extent': (-79.165922, 35.829095, -78.978468, 36.02426),

        # Set this to True if the region has multiple cities.
        # You will also need to set 'city_location_type'.
        'multiple_cities': True,

        # The major city in the region.
        'city_name': 'Chapel Hill',

        # Slug of an ebpub.db.LocationType that represents cities.
        # Only needed if multiple_cities = True.
        'city_location_type': 'cities',
    },
)

Here we enabled multiple_cities and informed OpenBlock that the location type slug is cities, respectively. Now 100 Pine Street will geocode properly:

What's Next

Now that we've had an overview of the geocoder, we'll jump into OpenBlock's place, location, and address parser. Stay tuned!

Update: Read more in OpenBlock Geocoder, Part 2: Text Parsing and Entity Extraction.

Django Without the Web

October 24 2011 by Dan Poirier

One of the things I like best about Django is how easy its ORM makes it to work with databases. Too bad Django is only for web applications. Sure, you could deploy a Django app and then make use of it from a non-web application using a REST API, but that would be too awkward.

But there is an easy way to use Django without the web! Here's the trick - write your application as Django management commands. Then you can run it from the command line. Just like 'manage.py syncdb' or 'manage.py migrate', you can run 'manage.py my_own_application' and your application has access to the full power of Django ORM.

Adding a new Django management command is surprisingly easy:

  1. Add a management/commands directory to your application.
  2. Create a anything.py file containing a class that extends django.core.management.base.BaseCommand or a subclass.
  3. Write a handle method that runs your application
  4. Run 'manage.py anything'

Here's an example of a trivial command:

from django.core.management.base import BaseCommand

class Command(BaseCommand):
    def handle(self, *args, **kwargs):
        print "Hello, world"

Create a management/commands directory in your application and save this there as 'hello.py'.

Now try it:

$ ./manage.py hello
Hello, world
$

How about doing something useful?  Here's an example that prints out all of your invoices, so you can see how easy it is to access your data:

from django.core.management.base import BaseCommand
from appname.models import Invoice

class Command(BaseCommand):
    def handle(self, *args, **kwargs):
        print "Invoices"
        for invoice in Invoice.objects.order_by('date'):
            print u"%s %s" % (invoice.date, invoice.summary)

I've used custom management commands to do things like importing data where something more complicated than loading a fixture was needed.

For more details, see the Django documentation.

Caktus 2012 Summer Internship Program

October 12 2011 by Nicole Foster

I'm excited to announce that Caktus is looking for candidates for our summer internship program. It is a 12 week paid position in our Carrboro, NC office. We're driving distance from UNC Chapel Hill, NC State Univeristy in Raleigh, and Duke in Durham, so students from all parts of the NC Research Triangle are welcome to apply.

We are looking for a web developer who enjoys working on a team and is excited to work on new and diverse projects. While working with us you will get to work on Django-powered web applications, learn about test driven development and other agile methodologies, perform front-end development in HTML, CSS and JavaScript (jQuery) and become familiar with Linux (Debian-flavor) desktop and server systems. Check out the full job posting here

If you'd like to spend your summer working with some great people on interesting projects please email us at jobs+website@caktusgroup.com with your resume and, if applicable, links to samples of code you have written. Kindly include a brief note describing why you would be a great fit for this opportunity.

Caktus Hosts 3rd Django Sprint in North Carolina

October 10 2011 by Nicole Foster

Here at Caktus, we love Django and use it to make all of our web applications. To help support the Django community, we are hosting a development sprint on November 12th and 13th at our office in Carrboro, NC in preparation for the 1.4 release. The sprint is a great is an excuse for people to get together and focus their undivided attention on improving Django. You will be helping out by providing bug fixes, improving the documentation and also adding features to existing packages.   

If you would like participate in the sprint, no previous experience is necessary and this would be a great time to start contributing.  Mark wrote a great blog piece about how to get started contributing to Django through sprinting that you can read here

We'll be here at 9:00 AM both days and the day usually ends between 4-5:00 PM, depending on the momentum, and afterwards everyone gets together for dinner and drinks. If you would like to attend, please RSVP at the Eventbrite and if you cannot make it to the office, please submit your name to the online roster

We look forward to seeing you!

Caktus Group Welcomes Designer and Front End Developer Julia Elman

September 30 2011 by Tobias McNulty

I'm delighted to announce that Julia Elman has joined our growing team of web developers here at Caktus. Julia started her design career almost 10 years ago in an internal marketing group, and first learned about Django at the SXSW Interactive Festival in 2008. Prior to joining the Caktus team, Julia worked at the Lawrence Journal World (the birthplace of Django) and as a freelance designer.

Caktus is a seasoned team of web developers that creates interactive, content-rich sites and applications with the Django web framework. We put a strong emphasis on best practices, employ an agile method, and also actively participate in the Django development community.

For more information about Caktus and our team, check out our newly updated team page!

Bulk inserts in Django

September 20 2011 by Dan Poirier

I recently found a way to speed up a large data import far more than I expected.

The task was to read data from a text file and create data records in Django, and the naive implementation was managing to import about 55 records per second, which was going to take far too long given the amount of data that needed to be imported.

My co-worker Karen Tracey suggested changing to bulk inserts. Instead of creating and saving one Django record at a time, we'd create a whole batch of Django objects, then save them all in one SQL operation. I figured reducing the number of database round-trips would speed things up somewhat, but was not prepared for the actual numbers - I'm consistently getting around two orders of magnitude improvement compared to single record inserts.

As I scaled up, I made one more change - instead of doing the insert in one batch, I limited each batch to a few hundred records. I didn't want to store an unlimited number of Django objects in memory at once, and some benchmarking showed that the benefit of batching the inserts leveled off at a few hundred records.

Caveats

There are a few differences from normal object creation. First, save() is not called on the instances, nor are post_save signals sent, and the model instances' primary keys are not set. If you're doing anything more complicated than dumping a bunch of data into the database, you'll probably need to stick with creating objects individually.

Also, the code we're using to do the bulk insert does not handle ForeignKeys properly. The workaround when creating the Django objects is to set the value of any ForeignKey field to the primary key of the object referred to, if any.

Example

Here's what code for a bulk insert might look like.

from bulkops import insert_many
from our_models import Book

objects = []
for data in data_source:
    # Assume data['foreign_key'] is a reference to another model
    # Change that to its primary key
    data['foreign_key'] = data['foreign_key'].pk
    objects.add(Book(**data))
    # Keep our batch size from getting too big
    if len(objects) > 200:
        insert_many(objects)
        objects = []
insert_many(objects)

Django 1.4

The current development branch of Django has added a bulk insert feature, which seems likely to be included in Django 1.4. It's very similar to the code we're using here - just change "insert_many(objects)" to "Book.objects.bulk_create(objects)". That's subject to change before Django 1.4 is released, of course.

Credit

Credit goes to Karen for suggesting the approach to me, and Ole Laursen's blog post for the original idea and the implementation that we're using.

Links

Ole Laursen's blog post: http://ole-laursen.blogspot.com/2010/11/bulk-inserting-django-objects.html

Implementation: http://people.iola.dk/olau/python/bulkops.py

Original commit to Django development: https://code.djangoproject.com/changeset/16739

Testing Web Server Configurations with Fabric and ApacheBench

September 13 2011 by Tobias McNulty

Load testing a site with ApacheBench is fairly straight forward. Typically you'd just SSH to a machine on the same network as the one you want to test, and run a command like this:

ab -n 500 -c 50 http://my.web.server/path/to/page/

The -n argument determines the number of requests to execute, and the -c argument the determines the concurrency level--or how many requests will be running simultaneously at any given time.

For Python and Django web applications, Fabric is popular tool for deploying code to and running other commands on remote servers. It's built in Python, and its simple syntax makes it easy to use as well. For more information and a primer on Fabric, check out the post that Colin Copeland wrote back in 2010, titled Basic Django deployment with virtualenv, fabric, pip and rsync.

Running ApacheBench from Fabric is useful because you can easily do other things like customize and update your web server configuration in an automated way. For example, here's a sample template for an Apache server configuration that I upload to our web servers using Fabric:

ServerName %(www_server_name)s

WSGIDaemonProcess my_site-%(environment)s processes=%(process_count)s threads=%(thread_count)s display-name=%%{GROUP}
WSGIProcessGroup my_site-%(environment)s
WSGIScriptAlias / %(apache_root)s/%(environment)s.wsgi

ErrorLog %(log_root)s/wsgi.error.log
LogLevel info
CustomLog %(log_root)s/wsgi.access.log combined

You'll notice the %s-style Python string formatting syntax in the Apache config. These are populated by Fabric's files.upload_template method when the file is copied to the remote server, and are based on variables you pass in to the context. Here's a sample Fabric method to upload your Apache configuration to the remote server:

def _join(*items):
    """
    We're deploying to Linux, so hard code that type of path join here. Using
    os.path.join would not work when deploying from Windows.
    """
    return '/'.join(items)

def apache_graceful():
    sudo('/etc/init.d/apache2 graceful')

def update_apache_conf(process_count=15, thread_count=1):
    env.process_count = process_count
    env.thread_count = thread_count
    for ext in ['conf', 'wsgi']:
        source = os.path.join(env.deployment_dir, 'templates',
                              'apache.%s' % ext)
        dest = _join(env.home, 'apache.conf.d',
                     '.'.join([env.environment, ext]))
        files.upload_template(source, dest, context=context, mode=0755,
                              use_sudo=True)
    apache_graceful()

Specifying process_count and thread_count in the arguments to update_apache_conf() means that I can pass those in from the command line, like so:

fab staging update_apache_conf:10,3

This would install an Apache configuration on the server that starts up 10 mod_wsgi processes with 3 threads each.

Running ApacheBench through Fabric is also easy to do, but here's a slightly more complex example I put together that saves the results in time-stamped folders, whose names also include the number of requests, concurrency level, process count, and thread count of the test:

def benchmark():
    config = {
        'number': 500,
        'concurrency': 50,
        'url': 'http://my.web.server/path/to/page/',
    }
    # prime the server with a few requests before logging any results
    run('ab -n 10 -c 1 {url}'.format(**config))
    context = dict(env)
    context.update(config)
    context['now'] = datetime.datetime.now().strftime('%Y-%m-%d_%H:%M:%S')
    dir_name = '{now}_n={number},c={concurrency}'
    if 'process_count' in context and 'thread_count' in context:
        dir_name += '_p={process_count},t={thread_count}'
    dir_name = dir_name.format(**context)
    context['test_dir'] = os.path.join('test_runs', dir_name)
    run('mkdir -p {0}'.format(context['test_dir']))
    for x in range(4):
        context['test_file'] = os.path.join(context['test_dir'],
                                            'ab{0}.txt'.format(x))
        run('ab -n {number} -c {concurrency} {url} > '
            '{test_file}'.format(**context))

You can run these commands together to update the Apache configuration and run a benchmark with a single line from the shell, like so:

fab staging update_apache_conf:10,5 benchmark

This would update the Apache configuration on the remote server, run a few requests to prime the server, and then run the specified ApacheBench test 4 times and save the results in text files in a timestamped directory.

To test lots of different server configurations at once with minimal user interaction, you can further script this by wrapping the above command in a Bash for loop, like so:

for process_count in {1..76..5}; do fab staging update_apache_conf:$process_count,1 benchmark; done

This command iterates from 1 through 76, in steps of 5 (1, 6, 11, 16 ... 76), sets the Apache configuration to use that number of processes, and runs a separate benchmark for each configuration.

Anyway, that's just a little insight into how one might deploy and test a Python or Django application using Fabric and ApacheBench. Hope you find it helpful!

Getting Started using Python in Eclipse

August 31 2011 by Dan Poirier

Eclipse with the PyDev module has a lot to offer the Python programmer these days. If you haven't looked at PyDev before, or not in a while, it's worth checking out.

Here are some of my favorite features:

  • One-keystroke navigation to the definitions of variables, methods, classes
  • Code completion, including automatically adding import statements
  • Clean up imports
  • Refactoring, including renaming across projects
  • Clean up whitespace

There are many more. I recommend taking a look at the PyDev web site and blog to see what might appeal to you.

Getting Eclipse and PyDev

If you're already using Eclipse, you can add PyDev to it. If not, you also have the option to get a version of Eclipse with PyDev already included. You install PyDev into your existing Eclipse the same way you install any other Eclipse add-on: first tell Eclipse where to find the add-on, then install it.

  • In Eclipse 3.6 and 3.7, select Help/Install New Software...
  • On the panel that pops up, click "Add..." at the top right.
  • Enter any name (e.g. "PyDev")
  • Enter http://pydev.org/updates as the Location, then click OK.
  • In the list of available software, select PyDev. 
  • Click Next, Next, accept the license, Finish.
  • If Eclipse asks whether to trust the PyDev certificate, agree.
  • When the install is complete, allow Eclipse to restart.

To get Eclipse with PyDev already installed, go to http://www.aptana.com/products/studio3/download and download Aptana Studio for your platform. Aptana Studio 3.0.4 is Eclipse 3.6 plus PyDev plus other add-ons.

Preferences

There are some preferences in Eclipse you probably want to change if you'll be working with Python.  Open the preferences by selecting Window/Preferences, then use search to find and set these:

  • Insert spaces for tabs: checked, but note that the PyDev editor ignores this and you need to make a similar setting in the PyDev settings for editing Python files.
  • Show whitespace characters:
    • In Eclipse 3.6, you probably want this off except when you're looking for trailing whitespace.
    • In Eclipse 3.7, you can check the box and then click on "whitespace characters" and set just the trailing whitespace visible, which is unobtrusive enough to leave enabled all the time.
  • Replace tabs with spaces when typing: checked.  This is the one that PyDev obeys.
  • Right trim lines: checked, otherwise you end up with a lot of lines with just indentation on them.
  • Add newline at end of file: checked.
  • Auto-Format editor contents before saving: If you check this, every time you save a file PyDev will fix it to comply with the other settings on this preferences page. That's great if you're working on your own project, but not so good if you're doing maintenance on somebody else's project and don't want to make random changes to white-space all over the place.

Explore the other PyDev settings. The "Code Analysis" section is particularly interesting, as it lets you control the kinds of things that Pydev marks as errors or warnings.

Finally, at least one Python interpreter needs to be configured.  Still in Preferences, go to PyDev/Interpreter - Python.  For now, just click "Auto Config" and click OK on the dialog that pops up.  Then click OK to close Preferences.  PyDev will take a while to analyze the python installation and libraries.

Perspective

Select Window/Open Perspective/Other and choose PyDev.

Starting to use Eclipse and PyDev with a project

I typically use Eclipse with Django projects, though I haven't tried PyDev's Django-specific features yet.

When I want to work with a project in Eclipse, first I check it out locally. Then here are the steps I follow:

  • File/New/Project (not PyDev project, I don't like the PyDev new project wizard)
  • Choose General/Project, click Next
  • Enter a project name
  • Uncheck "use default location" and set the location to the top directory of my project
  • Click Finish
  • Right-click on the project and select PyDev/Set as Pydev Project
  • Right-click on the project and select Properties
  • go to PyDev - PYTHONPATH
  • In the Source Folders tab, use "Add source folder" to add folders that need to be on your python path for your project to work.  Often this is either the top-level project folder or a folder immediately inside it.

Using PyDev with virtualenv

If you use virtualenv (and if not, why not?), there are a couple additional steps to take.

First, add the interpreter from your virtual environment as another Python interpreter:

  • Open Preferences
  • Go to PyDev/Interpreter - Python
  • Click "New..."
  • For the Executable, navigate to your virtual environment's bin directory and select the Python interpreter there.
  • Choose another name for your interpreter if you want, probably something shorter than the default.  I like to use the name of the virtual environment, with "-env" appended.
  • Click OK
  • Now here's the tricky part - a dialog will pop up asking which library folders to add.  Keep the defaults but you also need to add your system python library directories - e.g. /usr/lib/python2.6, /usr/lib64/python2.6, and /usr/lib/python2.6/plat-linux.  Otherwise PyDev won't be able to find all the libraries your python interpreter will be using.
  • Click OK

Then, set the new interpreter as the interpreter for your project:

  • Right-click the project and select Properties
  • Go to Pydev - Interpreter/Grammar
  • Under Interpreter, select your new interpreter
  • Click OK

Now PyDev should be able to find any libraries you have installed in the virtual environment when needed. 

If you install additional libraries, you might need to go back to the interpreter definitions, click "Apply", and tell Pydev which interpreters it should scan again. Until you do that, PyDev might not notice your new libraries.

For more information, see    http://pydev.blogspot.com/2010/04/pydev-and-virtualenv.html 

Links

Caktus Consulting Group Sponsors DjangoCon 2011

August 31 2011 by Nicole Foster
DjangoCon logo

DjangoCon 2011 is coming up next week and I'm excited to announce that Caktus is sponsoring the conference again this year! It is being held once again in beautiful Portland, Oregon from September 5th through the 10th. We've grown quite a bit from last year, there will be 9 team members-Colin, Tobias, Karen, Mark, Dan, Scott, George, Caleb and myself-attending the conference this year. 

We are all really excited to hear some great talks, meet other Django developers and learn more about our all time favorite framework. You can read about why we like it so much in our blog post Why Caktus Uses Django. 

Junior Django Developer Wanted

August 09 2011 by Nicole Foster

Caktus is currently seeking a junior Django developer for our team. The ideal candidate would have 6 months of experience of building dynamic web applications in any language, at least 3 months of experience using Python and Django, and also have a basic understanding of relational databases such as PostgreSQL and MySQL. The junior developer position will consist of data modeling complex business ideas, creating and integrating Django applications in new projects, maintaining existing Django projects and also assisting with Django deployments. 

 If this sounds like something you may be interested in or know someone who might be, check out the entire job posting here. Also, if you would like to apply, please submit your resume and code samples to jobs+website@caktusgroup.com. 

We're hiring a Front End Web Developer

August 04 2011 by Nicole Foster

I'm excited to announce that Caktus is actively seeking a Front End Web Developer. The position would entail creating wireframes and mock ups of proposed designs and user stories, performing front end jQuery and Javascript development, converting PSD's into standards compliant HTML and CSS, and also cloning repositories and running Django sites locally for development. The position would also consist of doing user experience design for internal and client projects. This is a contract for hire position with the potential to become a full time position with benefits. For a more detailed description of what you'd do as a Front End Web Developer here at Caktus, check out the full job posting here

If you think you might be a great fit or know someone who might be, please send over your resume and links to work you've done to jobs+website@caktusgroup.com, we would love to hear from you!

An alternative RapidSMS router implementation (with Celery!)

July 18 2011 by Colin Copeland

We've been using RapidSMS, a Django-powered SMS framework, more and more frequently here at Caktus. It's evolved a lot over the past year-- from being reworked to feel more like a Django app, to merging the rapidsms-core-dev and rapidsms-contrib-apps-dev repositories into a single codebase (no more submodules!), to finally becoming installable via pypi. The "new core" is in a great state now and is much easier to work with. However, one particular aspect of RapidSMS, the route process, has always been complicated and confusing to deal with. Tobias began the conversation on this issue after returning from a 6-week long UNICEF project in Zambia. He summarized the route process like so:

  • The route process as it currently stands is complicated; it includes a number of threads and the ways in which they interact is not always intuitive
  • If the route process dies unexpectedly, all backends (and hence message processing) are brought offline
  • Automated testing is difficult and inefficient, because the router (and all its threads) needs to be started/stopped for each test

The RapidSMS router is a globally instantiated object that routes incoming messages through each RapidSMS app and sends outgoing messages via installed backends. The run_router management command starts the router process and creates individual threads for each backend defined in the settings module. I'm not entirely certain as to why the route process was originally threaded, but I assume it was designed to more easily integrate blocking backends (like gsm) into RapidSMS. However, with the standardization of Kannel and SMS-based web services, like Twilio, both of which offload the low level communication work, I believe the threading aspect is now less important. So recently, in what started as a proof of concept, we began work on a decoupled router implementation called rapidsms-threadless-router. rapidsms-threadless-router provides a threadless_router app, which removes the threading functionality from the legacy Router class. Rather, all inbound requests are handled via the main HTTP thread. threadless_router attempts to:

  • Make RapidSMS backends more Django-like. Use Django’s URL routing and views to handle inbound HTTP requests
  • Remove clutter and complexity of route process and threaded backends
  • Ease testing – no more threading or Queue modules slowing down tests

In comparison to the legacy route process, threadless_router handles all inbound and outbound backend communication from within the main HTTP thread. Each request creates a new router instance and no separate process or thread is created. This simplifies the Router class significantly. Additionally, threadless_router allows inbound messages to be easily passed off to an asynchronous task queue, such as Celery. Task queues allow message processing to be handled outside of the HTTP request/response cycle, which is perfect for SMS-based applications, as out of band responses are more than acceptable.

threadless_router is not, however, a drop-in replacement for the legacy router. Legacy backends will not work and as all routing is handled from within the HTTP thread, non-HTTP backends, such as pygsm, are not currently compatible with threadless_router. A simple wrapper around pygsm could be written to talk both to the modem and spin up a simple HTTP server to communicate with RapidSMS. This would decouple pygsm from RapidSMS and exist as it's own separate process. Integrating with supervisord would work great here too. Several contrib applications, such as httptester and scheduler, are also not compatible. We've bundled a new httptester as a replacement and celerybeat can be used to mimic the scheduler functionality. A full list of caveats can be found in the docs.

The full documentation for rapidsms-threadless-router, including installation instructions and examples, can be found on readthedocs.org. If you're already familiar with the internals of RapidSMS and would like to see examples of threadless backend implementations, I suggest reviewing the bundled http and httptester backends and our updated twilio backend.

I would like to mention that Nicolas Pottier and Eric Newcomer created rapidsms-httprouter, which also handles all messages within the main HTTP thread. The main difference between rapidsms-httprouter and rapidsms-threadless-router is that, while httprouter handles inbound messages in a Django view, it still starts up threads (for handling outgoing messages) like the current router (also from within a Django view). Make sure to check it out as well and let us know what you think!

Caktus Lightning Talk Lunches

July 13 2011 by Nicole Foster

Mark and CalvinLast month we hosted the first talk in our new Caktus Lightning Talk Lunch series. We started this series to get together and learn about new projects, different applications, and interesting topics that the Caktus team has been working on. Lunch was provided from Buns in Chapel Hill and Mark Lavin gave the first talk.

Mark presented on django-selectable, a Django autocomplete app powered by jQuery UI. Similar to django-ajax-selects, django-selectable provides a framework to construct autocompleting text fields, but differs in a few key areas:

  • jQuery's native autocomplete plugin is used, rather than the now defunct bassistance version
  • A declarative, class-based paradigm is used for easily defining and customizing lookup sources. No more template propagation and inline JavaScript!
  • A registration process, similar to the Django admin, is used to enable lookups
  • Many-to-Many field support and multi-selection (via an editable deck list)
  • Multiple selects (and combo boxes) can be added to the same page

You can find Mark’s slides for the talk here. Also, the source code for django-selectable can be found here

And here are a few pictures from the meeting-

Mark

Team

Everyone had a great time and we are all really excited to see what other interesting topics the team comes up with to talk about.

The Buddha Website wins two Webby Awards

May 18 2011 by Nicole Foster

The Buddha logo

I am excited to announce that The Buddha site won two Webby Awards! It was built in partnership with Sonnet Media as a companion to The Buddha documentary directed by David Grubin that aired on PBS in April 2010. The site was built using Django to create a multi layered interactive experience to further explore the life and times of the Buddha.

The Buddha won both the People’s Voice Award and the Webby Award in the Religion and Spirituality category.

Congratulations to everyone who worked on the project and thank you so much for voting!

The Buddha Website Nominated for Webby Award

April 12 2011 by Tobias McNulty

I'm delighted to announce that the PBS companion website for The Buddha — a site that Caktus helped build using the Django web framework — has been nominated for a Webby Award in the Religion and Spirituality category! Online voting for the People’s Voice awards is now underway and we would appreciate you voting for the site and encouraging others to do so as well. You can also view the entry page directly here.

We appreciate your support and thank you for helping us honor this wonderful program!

Caktus 2011 Summer Internship Program

March 25 2011 by Nicole Foster

I'm excited to announce that Caktus is launching its summer internship program. It is a 12 week paid position in our Carrboro, NC office. We're in driving distance from UNC Chapel Hill, NC State in Raleigh, and Duke in Durham, so students from all parts of the NC Research Triangle are welcome to apply.

We are looking for a web developer who enjoys working on a team and is excited to work on new and diverse projects. While working with us you will get to work on Django-powered web applications, learn about test driven development and other agile methodologies, perform front-end development in HTML, CSS and JavaScript (jQuery) and become familiar with Linux (Debian-flavor) desktop and server systems. Check out the full posting here.

If you'd like to spend your summer working with some great people on interesting projects please email us at jobs+website@caktusgroup.com with your resume and, if applicable, links to samples of code you have written. Kindly include a brief note describing why you would be a great fit for this opportunity.

Sprinting on Django: A Layperson's Perspective

March 15 2011 by Mark Lavin

We just got back from another fun and successful PyCon. While we didn't get to stay for much of the sprints we did get to spend some time in the Django sprint Sunday and Monday. Monday morning I was there early and I noticed a bit of confusion among the Django sprinters. While I'm not a frequent contributor I've participated a few sprints at previous conferences and local sprints with Caktus. I shared with them my experiences and it seemed generally helpful so I thought I would share them here as well.

If you've ever sprinted at a large conference like PyCon or DjangoCon you've probably heard the speech from the core developer's about contributing. It's always nice to hear but it still doesn't stop people from being unsure about what to do or where to start at the sprints. I'm here to tell you as another non-core developer it really isn't that hard or scary. Django is a big project but you don't have to know everything to be able to help.

Where to start?

To start you should always read the contributing guide. Once you've got the Django source code checked out you should run the test suite just to make sure you know how. Do you have to check out from SVN? No, you don't. There are mirrors on GitHub and BitBucket and you should used the VCS that you are most comfortable with. Just remember to generate your patches from the source root directory and in a format that's compatible with SVN.

How do I find a ticket?

There are a couple different strategies for finding a good ticket to work on. The list of Trac tickets can be intimidating especially if this is your first time working on Django. One way to find a ticket to work on is to find an area that really frustrates you when using Django. Last year at DjangoCon I focused on contrib.sites, contrib.sitemaps and formsets because there were more than a couple issues that I had come across recently working with them. Thankfully we were even able to fix #11418 and #11358.

Some tickets have patches and tests but haven't been reviewed. It's fairly easy to download the patch, apply it and see if it works. You should also spend some time looking to make sure everything works the way the patch submitter claims it works. Either way you should comment on the ticket with your results. Also, some patches are fairly old and won't apply cleanly, but updating the original patch is something else that's easy to do and can be helpful. One common example of this is the recent shift from doctests to unittests. A number of patches that had tests may not apply because they were doctests. Converting those patches to use unittests is another way to help.

Another strategy is to find tickets which have patches but need tests. You can easily filter the Trac tickets by 'Has Patch' and 'Needs Tests'. Writing the tests can help you better understand the underlying code. You might also want to write some patches and if so just filter by the tickets without patches. Here you might also want to filter on areas that you are most comfortable with such as the admin, forms, or ORM. Remember that you need tests with all of your patches and documentation for new features.

Can I change the tickets?

I say don't be afraid to jump in. There is nothing you can do in Trac that can't be undone. While you shouldn't re-open tickets closed by core devs, if there are bugs you can't reproduce don't be afraid to say so. Tobias and I spent over an hour working to reproduce a bug at the last PyCon. In the end we closed the ticket as 'could not reproduce'. Once we did that at least two other people commented in IRC that they had tried and couldn't reproduce it either, but hadn't commented or closed the ticket. So, if you spend time looking at a ticket, do everyone a favor and share what you did. When commenting or closing a ticket remember to always be respectful. Someone was kind enough to take the time to put in a ticket to fix or improve Django and we are all a part of the same community.

The last thing I'll add is that contributing to Django doesn't have to start or end at the sprints. Trac is always available if you have some time to look through tickets. I hope this helps some people get the confidence to write or check patches for Django. Happy sprinting everyone! 

New Job Posting: Linux Systems Administrator with Python/Django experience

March 12 2011 by Tobias McNulty

I'm delighted to announce that we've just published another job posting for a Linux Systems Administrator at Caktus.  The position will involve maintaining existing Linux servers, designing and building highly-scalable deployments, and assistance with Django deployment and development as time permits.  This is a full-time position, with benefits, and is based out of our Carrboro, NC office (a short drive from Raleigh, Durham, and Chapel Hill).

For more information, follow the link on our careers page and let us know if you or someone you know might be interested in the position!

Caktus Consulting Group Sponsors PyCon 2011

March 09 2011 by Tobias McNulty

PyCon 2011 Atlanta is just around the corner, and I'm proud to announce that Caktus is a gold sponsor at the conference this year! We sponsored DjangoCon in both 2009 and 2010, and this year agreed to extend that support to the Python community in general.

PyCon US is the annual gathering of software developers who use the open source, Python programming language.  Django, our web framework of choice, is written in Python, so we use the language every day here at Caktus to create custom web applications and dynamic, content-rich web sites. Additionally, starting last year, we've put some of that knowledge to use extending and developing applications for the RapidSMS framework - a tool for creating mobile health and data collection applications that integrate web and mobile components (via SMS).

This year, the conference is being held March 9th through the 17th, 2011 in Atlanta, Georgia. We've grown a little since last year at this time; 7 Caktus team members—Colin, Karen, Mark, Mike, Calvin, Nicole, and myself—will be attending the conference. We're thrilled to be going again this year and hope to see you there!

We're currently looking for a Django developer to join the team, so stop by and introduce yourself if you or someone you know might be interested in the position!

New Careers Page Inaugurated with Django Job Posting

February 09 2011 by Tobias McNulty

I'm pleased to announce that we just released a new Careers section of our web site here at Caktus.  The section has been inaugurated with a new posting for a full-time Django developer position based out of our Carrboro, NC office (not far from Raleigh, Durham, or Chapel Hill), so kindly check it out and let us know if you or someone you know might be a good fit!

HIV Results, Birth Reminders, and Clinic Communication in Malawi

December 29 2010 by Tobias McNulty

I recently returned from a 6 week trip in Malawi, where I was heavily involved in the implementation and deployment of Project Mwana, an Information and Communication Technology (ICT) project focused on Maternal and Newborn Child Health (MNCH). The project is currently running as a pilot in both Zambia and Malawi. This post is a fairly technical overview of what the project does and the way in which it was developed.

The project aims to facilitate several things, including (a) secure delivery of HIV (Dry Blood Spot, or DBS) test results from the lab to health clinics by SMS, which we’ve named “Results160″ (b) appointment reminders for newborn children, or “RemindMi” (Mi = mothers & infants), and (c) free-text “chat” for health clinic workers and Community Health Workers, to strengthen communication and patient tracing.

Source Code

The source code for the project, which is based on Django and RapidSMS, can be found on GitHub:

I updated the developer setup instructions fairly recently, so, if you’re developer interested in this project or line of work, you should be able to get a local copy up and running without too much trouble. If you try and do have any issues, please let me know!

Team Composition

In Zambia, we have 2 on-going local developers, 1 temporary lead developer, 1 on-going local project manager, and 1 on-going project mentor. The team is similar in Malawi, except we have 1 on-going local developer, 1 temporary lead developer (myself), 1 on-going local project manager, and 1 on-going project mentor.

Development Workflow

For this phase of the project we adopted “git flow” to help guide our development workflow, and I think it was a big success (more below). See the following links for more information:

Code Organization

  • apps/ - we tried to separate functionality into separate Django/RapidSMS apps as much as possible.
  • backends/ - contains a RapidSMS backend for communicating with Kannel (an open source SMS gateway)
  • locale/ - translation files for Bemba and Chichewa
  • requirements/ - pip requirements files & corresponding tarballs (we found that checking in the tarballs was crucial for easily re-creating a development or production environment in low-bandwidth situations)
  • malawi/ - Malawi-specific configuration files & code
  • zambia/ - Zambia-specific configuration files & code

Settings Files

We took a strongly hierarchical approach to the settings files, to make it easy to share/override settings as necessary:

 
settings_project.py
  \> malawi/settings_country.py
         \-> malawi/settings_staging.py
         \-> malawi/settings_production.py
         \-> localsettings.py
  \-> zambia/settings_country.py
         \-> malawi/settings_staging.py
         \-> malawi/settings_production.py
         \-> localsettings.py

Each “sub” settings file simply imports from its “parent” at the top of the file, thereby allowing you to, for example, insert or append an app, add or remove a middleware, etc.

Apps

The project is divided up into the following main Django/RapidSMS apps: Results160 is implemented (mostly) in the “labresults” app, RemindMi in the “reminders” app, and the clinic chat in the “broadcast” app. The other apps support various pieces of functionality such as location management (”locations”), additions to the contact model (”contactsplus”), “stringcleaning,” and SMS printer integration (”tlcprinters”). I won’t pretend that they’re at all pluggable at the moment, but I’m sure that some of the more useful parts could be extracted & made independent, should the need arise.

Lessons learned

On the Malawi side, we learned a few things about building RapidSMS (or mobile health ICT projects in general) that I think are worth sharing:

(1) Regular meetings. Have quick meetings to review progress and plan next actions. Scrum type meetings really helped us review and quickly narrow down and fix things that we saw were not going too well before they became a problem.

(2) Pair Programming. Even though we were often pressed for time, pair programming proved very beneficial to the team and the project in general. Part of the mandate for the project is creating local capacity, and hands-on pair programming allowed for the newer or more junior local developers to both in gain a good overview of the codebase and learn from the more seasoned Python/Django developers on the team.

(3) Feature Branches. git-flow made them easy — which was key because we had two teams working simultaneously in two different countries (between which communication was difficult at times), and we wanted to share most of what we were doing while also shelter each other from potentially buggy commit of partially implemented features.

(4) Using a single code base for two deployments. It was difficult at times, when our two teams were changing different parts of the code and inadvertently breaking parts of it, but overall I think it was a big win & well worth the effort. We get the benefit of shared features & bug fixes, we don’t have to deal with maintaining separate forks, and it forces us to optimize our development workflow and make our code that much more configurable.

(5) Server environment. Get a public facing IP for your server and/or work location before beginning the implementation. We didn’t have one of these to start with in Zambia, while we did in Malawi. It makes life far, far easier for a number of reasons, including (a) it lets stakeholders get to the server (obviously), (b) it helps you be sure any connectivity issues are on the telco side, not yours, and (c) it lets you avoid corporate firewalls.

(6) SMS Gateway. If you have an SMS requirement, use Kannel. We started out with pygsm and our MultiTech modem would regularly get into a weird state where it was registered on the network & responded to AT commands, but no messages would come through. With other modems, the gsm backend didn’t delete messages from the SIM card, so it quickly filled up. After we switched to Kannel, we only had one case of downtime — and it was the RapidSMS route process, not Kannel, that was at fault. We also implemented the project on two different network carriers: Zain (now Airtel) via a GSM modem, and TNM (over the internet via SMPP). Kannel gave us a unified interface with which to interact with the two different backends. Kannel was also valuable as a “reference product” with which to test out the SMPP connection provided to us by TNM; as it turned out, when we couldn’t connect to it at first, the issue was on their end, not ours, and having a second opinion to back up our suspicions was key.

For more information, please see our page on Project Mwana or get in touch with a member of the Caktus team!

Simplifying the Testing of Unmanaged Database Models in Django

September 24 2010 by Tobias McNulty

Sometimes, when building a web application in Django, one needs to connect to a legacy database whose tables already exist. To support this use case, Django has the concept of "unmanaged models," which let you connect the Django ORM to tables that it assumes to exist (and not attempt to create).

This can make automated testing---which is something we take seriously at Caktus---rather difficult, because you might not have the SQL on hand to create an empty copy of the legacy database for testing purposes. One solution is to automatically set all your unmanaged models to "managed" during a test run, so that Django will happily create the tables for you. Typically this is enough to allow you to add sample data to the database and write tests as you would for any other model in Django. We've also found the approach to work especially well for database views (which typically are manifested as unmanaged models in Django), because it may be easier to test the code that uses the view by treating it as a table during automated testing.

There's a great snippet available for doing this, but the code is lengthly and and basically requires copying and pasting a large portion of the existing test runner in Django. Django 1.2, however, introduces a new class-based test runner that's much better suited for small modifications to the testing process like this.

To give it a try, I wrote a short piece of code that accomplishes this---making all unmanaged models in your Django project "managed" for the duration of the test run:

from django.test.simple import DjangoTestSuiteRunner


class ManagedModelTestRunner(DjangoTestSuiteRunner):
    """
    Test runner that automatically makes all unmanaged models in your Django
    project managed for the duration of the test run, so that one doesn't need
    to execute the SQL manually to create them.
    """
    def setup_test_environment(self, *args, **kwargs):
        from django.db.models.loading import get_models
        self.unmanaged_models = [m for m in get_models()
                                 if not m._meta.managed]
        for m in self.unmanaged_models:
            m._meta.managed = True
        super(ManagedModelTestRunner, self).setup_test_environment(*args,
                                                                   **kwargs)

    def teardown_test_environment(self, *args, **kwargs):
        super(ManagedModelTestRunner, self).teardown_test_environment(*args,
                                                                      **kwargs)
        # reset unmanaged models
        for m in self.unmanaged_models:
            m._meta.managed = False

Enjoy! Don't hesitate to comment with any questions or concerns.

Caktus Consulting Group Seeks Two Python/Django Web Developers

September 03 2010 by Tobias McNulty

I'm delighted to announce that Caktus is looking for two Python and/or Django web developers to join our team on a contract or part-time basis, with the potential for full-time work in the future.

Caktus builds custom web applications for local and remote clients using a variety of open-source technologies. We are a small team based in the Chapel Hill/Carrboro area of North Carolina (currently residing in Carrboro Creative Coworking). We believe in face-to-face contact, both with clients and amongst ourselves, and employ agile development techniques that emphasize teamwork and collaboration. We encourage you to meet the team and learn more about what we do.

We're looking for two experienced Python and/or Django web developers who enjoy working on a team and are excited to work on new projects. We have a preference for local candidates, but will consider all submissions. Your work will involve creating and integrating Django apps, working on existing Django projects, deployment, and database work.

You will be working in Linux (Debian-flavor) production environments with Apache and WSGI. Python and relational database experience is required. Django experience is a (big) plus. HTML/CSS and JavaScript experience are also a must, and jQuery is a plus.

If you're interested in one of these positions, please send us your resume, some sample Python code that you wrote, and links to any open-source projects you've contributed to. We're looking forward to meeting you!

Caktus Consulting Group Sponsors DjangoCon 2010

August 26 2010 by Tobias McNulty

DjangoCon 2010 is just around the corner, and I'm proud to announce that Caktus is sponsoring the conference again this year!

DjangoCon is the annual gathering of software developers who use the open source, Python-based Django web framework. We use the framework every day here at Caktus to create custom web applications and dynamic, content-rich web sites. Additionally, starting this year, we've put some of that knowledge to use extending and developing applications for the RapidSMS framework. For more information about why we use Django and think it's so great, check out our blog post titled Why Caktus Uses Django.

This year, the conference is being held again the week of September 6th in the beautiful city of Portland, Oregon. We've grown a little since last year at this time; it looks like 6 Caktus team members—Colin, Alex, Karen, Mark, Mike, and myself—will be attending the conference. We're positively thrilled to be going again this year and we hope to see you there!

Caktus Consulting Group Welcomes Lead Developer Karen Tracey

August 12 2010 by Tobias McNulty

I'm delighted to welcome Karey Tracey to our growing team of web developers here at Caktus. Karen is a core developer of the Django web framework and specializes in the development and testing of applications for the web. She is also the author of Django 1.1 Testing and Debugging, published by Packt Publishing in April, 2010.

Caktus is a seasoned team of web developers that creates interactive, content-rich sites and applications with the Django web framework. We put a strong emphasis on best practices, employ an agile method, and also actively participate in the Django development community.

For more information about Caktus and our team, check out our newly updated team page!

Expanded services, portfolio, and more in the new Caktus web site

June 07 2010 by Tobias McNulty

We're pleased to announce the release of the latest and greatest Caktus web presence yet. This edition features an enhanced services section and portfolio. Among other things, the new site demonstrates how our Django-based content management system can be used to connect related pages in customized, innovative ways.

In addition to Django web apps we've been building for years here at Caktus, we're now offering a related service in the mobile health development. Using RapidSMS, a communications framework built on Django, we build applications that add an SMS (text message) component to the standard Django stack. This allows non-standard users, such as clinic and community health workers, to interact with the system. Coupling high-tech and low-tech in this way lets us help remedy communication problems in places where Internet (and even power) are not widely available.

Check out the new site, and please let us know if you have any questions or comments!

Basic Django deployment with virtualenv, fabric, pip and rsync

April 22 2010 by Colin Copeland

Deployment is usually a tedious process with lots of tinkering until everything is setup just right. We deploy quite a few Django sites on a regular basis here at Caktus and still do tinkering, but we've attempted to functionalize some of the core tasks to ease the process. I've put together a basic example that outlines local and remote environment setup. This is a simplified example and just one of many ways to deploy a Django project (I learned a lot from Jacob Kaplan-Moss' django-deployment-workshop), so I encourage you to browse around the Django community to learn more. The entire source for this example project can be found in the caktus-deployment Bitbucket repository.

Local Development Environment

The project directory is organized like so:

caktus_website/
    __init__.py
    apache/
        staging.conf    -- staging Apache conf
        staging.wsgi    -- staging wsgi file
    blog/
    bootstrap.py        -- bootstrap local environment
    fabfile.py          -- manage remote environments with fabric
    local_settings.py
    manage.py
    media/
    requirements/
        apps.txt        -- pip requirements file
    settings.py
    settings_staging.py -- staging settings file
    urls.py

To setup a local development environment, we'll create a virtual environment and run bootstrap.py, which is just a simple script that automates installing Python dependencies using pip:

if "VIRTUAL_ENV" not in os.environ:
    sys.stderr.write("$VIRTUAL_ENV not found.\n\n")
    parser.print_usage()
    sys.exit(-1)
virtualenv = os.environ["VIRTUAL_ENV"]
file_path = os.path.dirname(__file__)
subprocess.call(["pip", "install", "-E", virtualenv, "--requirement",
                 os.path.join(file_path, "requirements/apps.txt")])

bootstrap.py uses requirements/apps.txt (a pip requirements file), so you can source anything off of PyPI as well as mercurial, git, and SVN repositories that include setup.py files. In this example, django's SVN is the only dependency in apps.txt:

-e svn+http://code.djangoproject.com/svn/django/branches/releases/1.1.X#egg=django

bootstrap.py must be run within virtual environment, so let's create a new virtualenv (I recommend using virtualenvwrapper) and then run bootstrap.py to install the dependencies:

copelco@montgomery:~/caktus_website$ mkvirtualenv --distribute caktus
(caktus)copelco@montgomery:~/caktus_website$ ./bootstrap.py

Now that our environment is setup (and Django is on the python path), we can run normal Django management commands:

(caktus)copelco@montgomery:~/caktus_website$ ./manage.py syncdb --settings=caktus_website.local_settings
(caktus)copelco@montgomery:~/caktus_website$ ./manage.py runserver --settings=caktus_website.local_settings

Great! That's it for our local setup, let's look into deploying the project to a staging server.

Deployment and Remote Management

To help provision the remote server environment (in this case Ubuntu 9.10), we'll use fabric. fabric allows you to streamline deployment by functionalizing common tasks in Python. I've created an example fabfile.py to help bootstrap and deploy the project:

(caktus)copelco@montgomery:~/caktus_website$ fab --list
Available commands:

    apache_reload        reload Apache on remote host
    apache_restart       restart Apache on remote host
    bootstrap            initialize remote host environment (virtualenv, dep...
    configtest           test Apache configuration
    create_virtualenv    setup virtualenv on remote host
    deploy               rsync code to remote host
    production           use production environment on remote host
    staging              use staging environment on remote host
    symlink_django       create symbolic link so Apache can serve django adm...
    touch                touch wsgi file to trigger reload
    update_apache_conf   upload apache configuration to remote host
    update_requirements  update external dependencies on remote host

The fabfile splits the deployment process into discrete steps of 1) virtual environment creation, 2) code transfer, and 3) updating the Python dependencies. The bootstrap command wraps everything together, including initial directory creation, so you can setup the server quickly:

def bootstrap():
    """ initialize remote host environment (virtualenv, deploy, update) """
    require('root', provided_by=('staging', 'production'))
    run('mkdir -p %(root)s' % env)
    run('mkdir -p %s' % os.path.join(env.home, 'www', 'log'))
    create_virtualenv()
    deploy()
    update_requirements()


def create_virtualenv():
    """ setup virtualenv on remote host """
    require('virtualenv_root', provided_by=('staging', 'production'))
    args = '--clear --distribute'
    run('virtualenv %s %s' % (args, env.virtualenv_root))


def deploy():
    """ rsync code to remote host """
    require('root', provided_by=('staging', 'production'))
    if env.environment == 'production':
        if not console.confirm('Are you sure you want to deploy production?',
                               default=False):
            utils.abort('Production deployment aborted.')
    extra_opts = '--omit-dir-times'
    rsync_project(
        env.root,
        exclude=RSYNC_EXCLUDE,
        delete=True,
        extra_opts=extra_opts,
    )
    touch()


def update_requirements():
    """ update external dependencies on remote host """
    require('code_root', provided_by=('staging', 'production'))
    requirements = os.path.join(env.code_root, 'requirements')
    with cd(requirements):
        cmd = ['pip install']
        cmd += ['-E %(virtualenv_root)s' % env]
        cmd += ['--requirement %s' % os.path.join(requirements, 'apps.txt')]
        run(' '.join(cmd))

To bootstrap the staging environment, run:

(caktus)copelco@montgomery:~/caktus_website$ fab staging bootstrap

This will run a few commands over SSH and rsync the project directory to a specific location on the staging server. Using rsync is just one of many ways to transfer code to the server, such as pulling code from a remote repository. The "deploy" fabfile can be modified to perform almost any transfer task. Once the bootstrap process is complete, the directory structure will look like so:

home/
    caktus/
        www/
            staging/
                env/               -- virtual environment
                    bin/
                    include/
                    lib/           -- contains site-packages
                    source/        -- contains django src
                caktus_website/
                    ...
                    apache/
                    manage.py
                    requirements/
                    ...

Now SSH to the server and run syncdb within the newly created virtual environment:

caktus@pike:~/www/staging/caktus_website$ source ../env/bin/activate
(env)caktus@pike:~/www/staging/caktus_website$ ./manage.py syncdb --settings=caktus_website.settings_staging

The staging setting's file is setup to use sqlite3 to simplify this deployment example. In practice we use PostgreSQL in our production environments, but database setup is for another blog post! To get Apache configured using mod_wsgi, we'll point the apache configuration to the staging.wsgi file using the WSGIScriptAlias directive. Here's an example Apache configuration to get a barebones Django environment up and running:

<VirtualHost:*80> WSGIScriptReloading On WSGIReloadMechanism Process WSGIDaemonProcess caktus_website-staging WSGIProcessGroup caktus_website-staging WSGIApplicationGroup caktus_website-staging WSGIPassAuthorization On WSGIScriptAlias / /home/caktus/www/staging/caktus_website/apache/staging.wsgi/ <Location "/"> Order Allow,Deny Allow from all </Location> <Location "/media"> SetHandler None </Location> Alias /media /home/caktus/www/staging/caktus_website/media <Location "/admin-media"> SetHandler None </Location> Alias /admin-media /home/caktus/www/staging/caktus_website/media/admin ErrorLog /home/caktus/www/log/error.log LogLevel info CustomLog /home/caktus/www/log/access.log combined </VirtualHost:*80>

We'll use Apache to serve static media (both local and admin media) and direct everything else to the Django instance through mod_wsgi. In order for the wsgi instance to be aware of our environment and project directory, we need to add the virtual environment's site-packages directory, the project directory to the python path, and tell Django which settings file to use by setting the DJANGO_SETTINGS_MODULE environment variable:

import os
import sys
import site

PROJECT_ROOT = os.path.dirname(os.path.dirname(os.path.dirname(__file__)))
site_packages = os.path.join(PROJECT_ROOT, 'env/lib/python2.6/site-packages')
site.addsitedir(os.path.abspath(site_packages))
sys.path.insert(0, PROJECT_ROOT)
os.environ['DJANGO_SETTINGS_MODULE'] = 'caktus_website.settings_staging'

import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()

Now just upload the staging apache configuration and reload apache:

(caktus)copelco@montgomery:~/caktus_website$ fab staging update_apache_conf

That's it! The site should be up and running on your server's public IP. If you run into any trouble (like a 500 Internal Server Error), just tail the Apache error.log, it'll usually point you in the right direction.

Caktus Consulting Group hosts 2nd Django sprint in NC Triangle area

March 16 2010 by Tobias McNulty

Django is a tool we use every day to build fantastic web apps here at Caktus, and a development sprint is a concerted, focused period of time in which developers meet in the same space to get things done on a project.

We're proud to annouce that Caktus is hosting another local Django development sprint in the Triangle (Raleigh, Durham, and Chapel Hill/Carrboro) area of North Carolina. The sprint will be held the weekend of March 20th and 21st in Carrboro Creative Coworking, and the purpose of this sprint will be to help push out bug fixes in preparation for the upcoming Django 1.2 release.

If you're interested in attending, no previous experience contributing to Django is necessary and the sprint will be a great opportunity to start. Work on other open source Django-based projects is welcome too. For more information, check out the corresponding wiki page.

We'll be there to open the doors at 9am both days. Courtesy of our sponsors there will be free drinks, snacks, and lunch to go around. Hope to see you there!

Decoupled Django Apps and the Beauty of Generic Relations

March 11 2010 by Tobias McNulty

Like just about everyone else, we've written our own suite of tools to help with building complex content management systems in Django here at Caktus. We reviewed a number of the existing CMSes out there, but in almost every case the navigation and page structure were so tightly coupled the system broke down when it came time to add additional, non-CMS pages.

We wrote a few little apps, django-pagelets, django-treenav, and django-crumbs, each of which manages different pieces of content (little snippets of content, full CMS pages, navigation, and breadcrumbs). All of the apps are available for free under an open source license on Google Code.

Decoupling was a great move for us, and the ability to plug and play any single part of the system is a huge benefit. Sometimes, however, the completely decoupled architecture was a bit of a pain: If we didn't provide a link from the pagelets app to the treenav app, how would it be possible to edit a page's corresponding navigation item on its change form in the Django admin interface?

Enter Generic Relations. Using Django's content types framework, it's possible to create admin inlines for generic relations with just a few simple lines of code.

In this case, I'll show how we allowed users to edit a page's corresponding navigation item in django-pagelets without requiring everyone (i.e., those who don't need it) to install django-treenav. First, define the generic inline in the admin.py file of the app that contains the model you want to link to:

from django.contrib.contenttypes import generic
class GenericMenuItemInline(generic.GenericStackedInline):
    """
    Add this inline to your admin class to support editing related menu items
    from that model's admin page.
    """
    max_num = 1
    model = treenav.MenuItem

Then, inside the Admin class for the related model in question, dynamically import and add GenericMenuItemInline to the admin's list of inlines based on whether or not it's in the project's INSTALLED_APPS:

from django.conf import settings
class PageAdmin(admin.ModelAdmin):
    # ...
    inlines = [MyOtherInline]
    if 'treenav' in settings.INSTALLED_APPS:
        from treenav.admin import GenericMenuItemInline
        inlines.insert(0, GenericMenuItemInline)

For more information, see the corresponding pagelets admin.py and treenav admin.py. Thanks for reading and don't hesitate to post comments if you have any questions!

Continuous Integration with Django and Hudson CI (Day 1)

March 08 2010 by Colin Copeland

We're always looking for new tools to make our development environment more robust here at Caktus. We write a lot of tests to ensure proper functionality as new features land and bug fixes are added to our projects. The next step is to integrate with a continuous integration system to automate the process and regularly check that status of the build.

After attending Dr. C. Titus Brown's "Why not run all your tests all the time? A study of continuous integration systems." talk at Pycon and seeing Django's Hudson setup, I figured I'd take a look at Hudson CI.

Installing Hudson and basic setup

Hudson is very easy to setup. I started with a fresh Ubuntu 9.10 install on the smallest Rackspace cloud instance and had it running after a few commands. I followed the Debian setup instructions, which basically consists of:

$ wget -O - http://hudson-ci.org/debian/hudson-ci.org.key | sudo apt-key add -
$ echo "deb http://hudson-ci.org/debian binary/" &gt;&gt; /etc/apt/sources.list
$ apt-get update
$ aptitude install hudson
$ apt-get upgrade

That's it! It's already up and running on port 8080 using it's own web server. Go ahead and pull it up in your browser.

As a test, let's setup django-crm (a Caktus open-source community project) as our first Hudson job. Click "New Job", type in a job name, click "Build a free-style software project", and hit OK. django-crm contains a sample project that we'll use to run the test suite. On the job configuration page, check Subversion in the Source Code Management section and type in the Repository URL:

Click Save, run the job by clicking "Build Now", and check out the Console Output:

Started by user anonymous
Checking out a fresh workspace because /var/lib/hudson/jobs/django-crm/workspace/sample_project doesn't exist
Checking out http://django-crm.googlecode.com/svn/trunk/sample_project
A         manage.py
A         site_media
A         site_media/css
A         site_media/css/jquery.autocomplete.css
A         site_media/css/django-contactinfo.css
A         site_media/js
A         site_media/js/jquery-ui-1.7.2.custom.min.js
A         site_media/js/jquery-1.3.2.min.js
A         site_media/js/django-crm.js
A         site_media/js/jquery.autocomplete.min.js
...
Finished: SUCCESS

Cool, now let's run some tests. Too keep things simple, let's grab Django and a few dependencies using aptitude:

$ wget http://www.djangoproject.com/download/1.1.1/tarball/
$ tar xzvf Django-1.1.1.tar.gz
$ cd Django-1.1.1
$ sudo python setup.py install
$ aptitude install python-dev python-imaging python-setuptools python-pip

To run the tests, add an "Execute shell" build step in the Build section with this command:

#!/bin/bash -ex
cd sample_project
python manage.py test crm

Run the job again and look for the test results in the console output:

[workspace] $ /bin/sh -xe /tmp/hudson6670261053226891793.sh
+ cd sample_project
+ python manage.py test crm
...
Finished: SUCCESS

XML Test output

To integrate Hudson with the Django test suite, I used unittest-xml-reporting. Just "pip install unittest-xml-reporting" and add the following lines to your settings file:

TEST_RUNNER = 'xmlrunner.extra.djangotestrunner.run_tests'
TEST_OUTPUT_VERBOSE = True
TEST_OUTPUT_DESCRIPTIONS = True
TEST_OUTPUT_DIR = 'xmlrunner'

Then check "Publish JUnit test result report" in the Post-build Actions section and add the path to the test XML output "sample_project/xmlrunner/*.xml":

Run the job and you should see a new "Test Result" link in the navigation. Now you can view the test results right in your browser window.

Coverage

To add coverage reports, I used Ned Batchelder's coverage.py (pip install coverage). Navigate to Hudson's plugin manager (Hudson -> Manage Hudson -> Manage Plugins), install the Cobertura Plugin, and restart Hudson when prompted. Then modify your shell script like so:

#!/bin/bash -ex
cd sample_project
coverage run manage.py test crm
coverage xml --omit=/usr/

This will generate an XML coverage report in the working directory, so we just need to tell Hudson where to look for it. Check "Publish Cobertura Coverage Report" in the Post-build Actions section and enter the path to the report:

Run the build again and you should have access to a new "Coverage Report" link.

More to come...

This was just a simple example of getting Hudson setup with a Django project and I know a lot more can be done with Hudson (check out the large number of available plugins). The top items on my todo list are: see Hudson setup environments with virtualenv and pip, integrate more closely with the test suite (possibly using nose), check for PEP compliance, and setup build failure notifications. I hope to write more as I continue to setup our Hudson environment!

References

A few useful Hudson/Python/Django links I discovered while running through this setup:

Caktus Sends Team of Five to PyCon 2010 in Atlanta

February 17 2010 by Tobias McNulty

Python and Django are tools we use on a daily basis to build fantastic web apps here at Caktus. I'm pleased to announce that Caktus is sending five developers--Colin, Alex, Mike, Mark, and myself--to PyCon 2010! PyCon is an annual gathering for users and developers of the open source Python programming language. This year the US conference is being held in Atlanta, GA. We'll be driving down tomorrow (Thursday) from Chapel Hill, NC and staying for the conference weekend plus one day of the sprints.

Hope to see you there!

Caktus Consulting Group hosts Django sprint in Triangle, NC area

December 06 2009 by Tobias McNulty

Django is a tool we use every day to build rock-solid web apps here at Caktus, and a development sprint is a concerted, focused period of time in which developers meet in the same space to get things done on a project.

We're proud to annouce that Caktus is hosting a local Django development sprint in the Triangle (Raleigh, Durham, and Chapel Hill/Carrboro) area of North Carolina. The sprint will be held the weekend of December 12th and 13th in Carrboro Creative Coworking, and the purpose of this sprint will be to help finish features and push out bug fixes in preparation for the upcoming Django 1.2 release.

If you're interested in attending, no previous experience contributing to Django is necessary and the sprint will be a great opportunity to start. Work on other open source Django-based projects is welcome too. For more information, check out the corresponding wiki page and don't forget to register for the event.

We'll be there to open the doors at 9am both days. Courtesy of our sponsors there will be free drinks, snacks, and lunch to go around. Hope to see you there!

Custom JOINs with Django's query.join()

September 28 2009 by Colin Copeland

Django's ORM is great. It handles simple to fairly complex queries right out the box without having to write any SQL. If you need a complicated query, Django's lets you use .extra(), and you can always fallback to raw SQL if need be, but then you lose the ORM's bells and whistles. So it's always nice to find solutions that allow you to tap into the ORM at different levels.

Recently, we were looking to perform a LEFT OUTER JOIN through a Many to Many relationship. For a lack of a better example, let's use a Contact model (crm_contact), which has many Phones (crm_phones):

class Contact(models.Model):
    name = models.CharField(max_length=255)
    phones = models.ManyToManyField('Phone')
    addresses = models.ManyToManyField('Address')

class Phone(models.Model):
    number = models.CharField(max_length=16)

If we want to display each contact and corresponding phone numbers, looping through each contact in Contact.objects.all() and following the phones relationship will generate quite a few database queries (especially with a large contact table). select_related() doesn't work in this scenario either, because it only supports Foreign Key relationships. We can use extra() to add a select parameter, but tables=['crm_phones'] will not generate a LEFT OUTER join type. We need to explicitly construct the JOIN.

DISCLAIMER: The following method does work, but should not be considered best practice. That is, there may be a better way to accomplish the same task (please comment if so!). But after sparse Google results for similar scenarios, I figured it'd at least be useful to post what we discovered.

After digging around in django.db.models.sql for a bit, we found BaseQuery.join in query.py. Among the possible arguments, the most important is connection, which is "a tuple (lhs, table, lhs_col, col) where 'lhs' is either an existing table alias or a table name. The join corresponds to the SQL equivalent of: lhs.lhs_col = table.col". Further, the promote keyword argument will set the join type to be a LEFT OUTER JOIN.

Now we can explicitly setup the JOINs through crm_contact -> crm_contact_phones -> crm_phone:

contacts = Contact.objects.extra(
    select={'phone': 'crm_phone.number'}
).order_by('name')

# setup intial FROM clause
# OR contacts.query.get_initial_alias()
contacts.query.join((None, 'crm_contact', None, None))

# join to crm_contact_phones
connection = (
    'crm_contact',
    'crm_contact_phones',
    'id',
    'contact_id',
)
contacts.query.join(connection, promote=True)

# join to crm_phone
connection = (
    'crm_contact_phones',
    'crm_phone',
    'phone_id',
    'id',
)
contacts.query.join(connection, promote=True)

It's a little verbose, but it accomplishes our goal. I used hardcoded table names/columns in the connection tuple to make it easier to follow, but we can also extract this information from the objects themselves:

contacts = Contact.objects.extra(
    select={'phone': 'crm_phone.number'}
).order_by('name')

# setup intial FROM clause
# OR contacts.query.get_initial_alias()
contacts.query.join((None, Contact._meta.db_table, None, None))

# join to crm_contact_phones
connection = (
    Contact._meta.db_table, # crm_contact
    Contact.phones.field.m2m_db_table(), # crm_contact_phones
    Contact._meta.pk.column, # etc...
    Contact.phones.field.m2m_column_name(),
)
contacts.query.join(connection, promote=True)

# join to crm_phone
connection = (
    Contact.phones.field.m2m_db_table(),
    Phone._meta.db_table,
    Contact.phones.field.m2m_reverse_name(),
    Phone._meta.pk.column,
)
contacts.query.join(connection, promote=True)

This results in a row for each phone number (Cartesian product), but we can print out each contact and corresponding phone numbers (with a single SQL statement) quickly in a template using {% ifchanged %}:

<h1>Contacts</h1>

{% for contact in contacts %}
    {% ifchanged contact.name %}
        <h2>{{ contact.name }}</h2>
    {% endifchanged %}
    <p>Phone: {{ contact.phone }}</p>
{% endfor %}

Web Developer for Hire

September 23 2009 by Colin Copeland

We're pleased to announce that Caktus is looking for a developer to join our team on a contract basis!

What do we do? We build custom web applications for local and remote clients using a variety of open-source technologies. We are a small team founded in the Chapel Hill/Carrboro area (currently residing in Carrboro Creative Coworking) who believe in face-to-face contact and employ agile development techniques that emphasize teamwork and collaboration.

We're looking for a strong software developer who enjoys working on a team and is excited to learn and experiment with new technologies. We do have a preference for local candidates, but will consider all submissions. Initial work will focus on maintaining small Django-powered websites. This will involve HTML/CSS (including converting Photoshop designs), Django Templates, and writing Unit Tests. Later work will involve creating and integrating Django apps into larger projects, deployment, and database work.

You will be working in Linux (Debian-flavor) production environments with Apache and WSGI. Python/Django experience is not required, but will be used on a daily basis. Relational database experience is a must. HTML/CSS and JavaScript experience are also a must, and jQuery is a plus.

If you're interested in this position, please send us your resume, some example code, links to any open-source projects you've contributed to, and expected compensation. We're excited to bring on a new team member!

Open Source Django Projects from Caktus Consulting Group

September 07 2009 by Tobias McNulty

At Caktus we're big fans of reusing code. We leverage many open source projects--especially Django apps--to accomplish a variety of tasks. In addition, we've written quite a few pluggable apps over the paste two years that we reuse over and over again for different projects. As a way of giving back to the community, we've polished and released a portion of that code as open source ourselves. While some of the projects have been available on Google Code for awhile now, we just put together a consolidated list of open source Django projects on our web site to serve as a jumping off point for all the projects we like, we contributed to, and we created. Enjoy!

Caktus Consulting Group, LLC sponsors DjangoCon 2009

September 05 2009 by Tobias McNulty

Django is a tool we use on a daily basis to build fantastic web apps here at Caktus, and DjangoCon is the annual conference for Django developers and other community members. We are proud to announce that Caktus Consulting Group, LLC is sponsoring DjangoCon 2009!

This year, the conference is being held the week of September 7th in the beautiful city of Portland, Oregon. Two Caktus partners, Colin and myself, will be attending. We hope to see you there!

Creating recursive, symmetrical many-to-many relationships in Django

August 14 2009 by Tobias McNulty

In Django, a recursive many-to-many relationship is a ManyToManyField that points to the same model in which it's defined ('self'). A symmetrical relationship is one in where, when a.contacts = [b], a is in b.contacts.

In changeset 8136, support for through models was added to the Django core. This allows you to create a many-to-many relationship that goes through a model of your choice:

class Contact(models.Model):
    contacts = models.ManyToManyField(
        'self',
        through='ContactRelationship',
        symmetrical=False,
    )


class ContactRelationship(models.Model):
    types = models.ManyToManyField(
        'RelationshipType',
        related_name='contact_relationships',
        blank=True,
    )
    from_contact = models.ForeignKey('Contact', related_name='from_contacts')
    to_contact = models.ForeignKey('Contact', related_name='to_contacts')

    class Meta:
        unique_together = ('from_contact', 'to_contact')

According to the Django Docs, you must set symmetrical=False for recursive many-to-many relationships. Sometimes--for a recent case in django-crm, for example--what you really want is a 

symmetrical, recursive many-to-many relationship.

The trick to getting this working is understanding what symmetrical=True actually does. From what we can tell after a brief look through the Django core, symmetrical=True is simply a utility that (a) creates a second, reverse relationship in the many-to-many table, and (b) hides the field in the related model (in this case the same model) from use by appending a '+' to its name.

Since you normally have to create many-to-many relationships manually when a through model is specified, the solution is simply to leave symmetrical=False (otherwise it'll raise an exception) and create the reverse relationship manually yourself via the through model:

crm.ContactRelationship.objects.create(
    from_contact=contact_a,
    to_contact=contact_b,
)
crm.ContactRelationship.objects.create(
    from_contact=contact_b,
    to_contact=contact_a,
)

Additionally, you'll have to do a little cleanup to make sure both sides of the relationship are removed when one is removed, but otherwise this should achieve the same effect as setting symmetrical=True in other many-to-many relationships.

To hide the other side of the related manager, you can append a '+' to the related_name, like so:

class Contact(models.Model):
    contacts = models.ManyToManyField(
        'self',
        through='ContactRelationship',
        symmetrical=False,
        related_name='related_contacts+',
    )

Good luck and feel free to comment with any questions!

Towards a Standard for Django Session Messages

June 19 2009 by Tobias McNulty

Django needs a standard way in which session-specific messages can be created and retrieved for display to the user. For years we've been surviving using user.message_set to store messages that are really specific to the current session, not the user, or using the latest and greatest Django snippet, pluggable app, or custom crafted middleware to handle messages in a more appropriate way.

While this has been discussed at length in Ticket #4604 as well as on Django Snippets, here are a few reasons that user.message_set is the wrong implementation:

  • No message_set exists for AnonymousUsers in Django, so you can't display any messages to them.
  • What happens when the same user is logged in from two different browsers and completing two different tasks, simultaneously? When using user.message_set to store feedback for the user, the messages will be distributed on a first come first served basis, with no regard for what session actually generated what feedback. For this reason it's bad to get in the habit of using user.message_set for messages like "Article updated successfully," or other messages that really have no context outside the current session.

I've outlined a few characteristics below that I believe would make up a solid session messaging contrib app. Please feel free to comment if I missed anything, or if you've got beef with any of my points. This is in many ways a work in progress, so I'll update it as often as I can.

  • Standards. The implementation ought to make it clear how multiple messages are to be stored and retrieved for display to the user. Maybe you need to push multiple messages onto the stack from a single view, or your app performs multiple redirects through different views.
  • Persistence. In the case where your app redirects through multiple views, it's not acceptable for session messages to disappear. The implementation needs to provide facilities for determining whether or not the messages were actually displayed, and delay purging the message list if necessary.
  • Flexibility. Support the case where a large number of independent, pluggable apps do messaging in the same project (sometimes for the same request), but don't require it. Display all the messages created by all the apps, but don't break (or lose messages) if one of the apps doesn't happen to use the messaging implementation.
  • Efficiency. Avoid storing messages in the database (or another persistent store) if possible. While it's possible to use memcache as a session backend, this isn't always possible. One potential implementation would be to store shorter messages directly in a cookie, but provide a fallback to session-based storage for longer messages.

Here's the implementation we use at Caktus, which is far from complete but it does address some of these points. This code is based on a number of snippets as well as attachments to the above referenced ticket. It could be improved by purging each message independently when it is actually retrieved and adding facilities for cookie-based storage. While I haven't used it yet, django-notify looks a lot better than this and I'm excited about trying it out.

from django.utils.encoding import StrAndUnicode
from django.contrib.sessions.backends.base import SessionBase

MESSAGES_NAME = '_messages'

SessionBase.get_messages = lambda self: self[MESSAGES_NAME]

def _session_get_and_delete_messages(self):
    messages = self.pop(MESSAGES_NAME, [])
    self[MESSAGES_NAME] = []
    return messages
SessionBase.get_and_delete_messages = \
  _session_get_and_delete_messages

def _session_create_message(self, message):
    self[MESSAGES_NAME].append(message)
    self.modified = True
SessionBase.create_message = _session_create_message

class SessionMessagesMiddleware(object):
    """
    To store messages or other user feedback in the session, add this
    class to your middleware.
    
    In your views, call request.session.create_message('the message') to
    add a message to the session.
    
    In your template(s), do this:
    
        {% if request.messages %}
            {% for message in request.messages %}<li>{{ message|escape }}</li>{% endfor %}
        {% endif %}
    
    Messages will NOT be erased from the session if you never access request.messages.
    """
    
    class LazyMessages(StrAndUnicode):
        """
        A lazy proxy for session messages.
        """
        def __init__(self, session):
            self.session = session
            super(SessionMessagesMiddleware.LazyMessages, self).__init__()
            
        def __iter__(self):
            return iter(self.messages)
    
        def __len__(self):
            return len(self.messages)
    
        def __nonzero__(self):
            return bool(self.messages)
    
        def __unicode__(self):
            return unicode(self.messages)
    
        def __getitem__(self, *args, **kwargs):
            return self.messages.__getitem__(*args, **kwargs)
    
        def _get_messages(self):
            if not hasattr(self, '_messages'):
                self._messages = self.session.get_and_delete_messages()
            return self._messages
        messages = property(_get_messages)
    
    def process_request(self, request):
        if not hasattr(request, 'session'):
            raise AttributeError('Request has no attribute "session".  Make sure session middleware is running before SessionMessages middleware.')
        
        if MESSAGES_NAME not in request.session:
            request.session[MESSAGES_NAME] = []
        
        request.messages = \
          SessionMessagesMiddleware.LazyMessages(request.session)

Remote logging with Python logging and Django

June 09 2009 by Tobias McNulty

As part of my work on EveryWatt, our fledgling energy monitoring web site, I needed a way to consolidate log messages from all the data loggers we have running in a single place. If you're not familiar with it, Python's logging module is good stuff and worth checking out. We already used it for logging to files locally, and the module defines an HTTPHandler that can deliver log messages to a remote server via HTTP.

To implement the Django side, I wrote a lightweight pluggable app to receive the log messages and store them in the database. To use the app, just create an HTTPHandler that points to your Django site, and add it to a logger:

import logging
import logging.handlers
logger = logging.getLogger('mylogger')
http_handler = logging.handlers.HTTPHandler(
    'django.app.hostname:port',
    '/remotelog/your_app_slug/log/',
    method='POST',
)
logger.addHandler(http_handler)
logger.info('testing remote logging')

On the Django side, navigate to /admin/remotelog/logmessage/ and you should have a nice interface (courtesy of the Django admin) to filter, search, and sort log messages as they come in. The app is called django-remotelog, and it's up on Google code. Check it out, and feel free to comment.

Testing Django Views for Concurrency Issues

May 26 2009 by Tobias McNulty

At Caktus, we rely heavily on automated testing for web app development. We create tests for all the code we write, ideally before the code is written. We create tests for every bug we find and, resources permitting, ramp up the test suite with lots of random input and boundary testing.

Debugging concurrency issues or race conditions has long been a nightmare. There are only so many times you can double click the link in your web app that is generating some bizarre failure.

Using the Django test client, I created a little decorator that you can use in your unit tests to make sure a view doesn't blow up when it's called multiple times with the same arguments. If it does blow up, and you happen to be using PostgreSQL, chances are you can fix the issues by using Colin's previously posted require_lock decorator.

Here's the decorator for testing concurrency:

def test_concurrently(times):
    """ 
    Add this decorator to small pieces of code that you want to test
    concurrently to make sure they don't raise exceptions when run at the
    same time.  E.g., some Django views that do a SELECT and then a subsequent
    INSERT might fail when the INSERT assumes that the data has not changed
    since the SELECT.
    """
    def test_concurrently_decorator(test_func):
        def wrapper(*args, **kwargs):
            exceptions = []
            import threading
            def call_test_func():
                try:
                    test_func(*args, **kwargs)
                except Exception, e:
                    exceptions.append(e)
                    raise
            threads = []
            for i in range(times):
                threads.append(threading.Thread(target=call_test_func))
            for t in threads:
                t.start()
            for t in threads:
                t.join()
            if exceptions:
                raise Exception('test_concurrently intercepted %s exceptions: %s' % (len(exceptions), exceptions))
        return wrapper
    return test_concurrently_decorator

To use this in a test, create a small function that includes the thread-safe code inside your test. Apply the decorator, passing the number of times you want to run the code simultaneously, and then call the function:

class MyTestCase(TestCase):
    def testRegistrationThreaded(self):
        url = reverse('toggle_registration')
        @test_concurrently(15)
        def toggle_registration():
            # perform the code you want to test here; it must be thread-safe 
            # (e.g., each thread must have its own Django test client)
            c = Client()
            c.login(username='user@example.com', password='abc123')
            response = c.get(url)
        toggle_registration()

Explicit Table Locking with PostgreSQL and Django

May 26 2009 by Colin Copeland

By default, Django doesn't do explicit table locking. This is OK for most read-heavy scenarios, but sometimes you need guaranteed, exclusive access to the data. Caktus uses PostgreSQL in most of our production environments, so we can use the various lock modes it provides to control concurrent access to the data. Once we obtain a lock in PostgreSQL, it is held for the remainder of the current transaction. Django provides transaction management, so all we need to do is execute a SQL LOCK statement within a transaction, and Django and PostgreSQL will handle the rest.

Below is an example decorator we came up with to provide easy table-locking access in Django:

from django.db import transaction

LOCK_MODES = (
    'ACCESS SHARE',
    'ROW SHARE',
    'ROW EXCLUSIVE',
    'SHARE UPDATE EXCLUSIVE',
    'SHARE',
    'SHARE ROW EXCLUSIVE',
    'EXCLUSIVE',
    'ACCESS EXCLUSIVE',
)

def require_lock(model, lock):
    """
    Decorator for PostgreSQL's table-level lock functionality
    
    Example:
        @transaction.commit_on_success
        @require_lock(MyModel, 'ACCESS EXCLUSIVE')
        def myview(request)
            ...
    
    PostgreSQL's LOCK Documentation:
    http://www.postgresql.org/docs/8.3/interactive/sql-lock.html
    """
    def require_lock_decorator(view_func):
        def wrapper(*args, **kwargs):
            if lock not in LOCK_MODES:
                raise ValueError('%s is not a PostgreSQL supported lock mode.')
            from django.db import connection
            cursor = connection.cursor()
            cursor.execute(
                'LOCK TABLE %s IN %s MODE' % (model._meta.db_table, lock)
            )
            return view_func(*args, **kwargs)
        return wrapper
    return require_lock_decorator

This is, by no means, a perfect solution. Feel free to comment below.

Parsing Microseconds in a Django Form

May 26 2009 by Tobias McNulty

There's currently no way to accept microsecond-precision input through a Django form's DateTimeField. This is an acknowledged bug, but the official solution might not come very soon, because the real fix is non-trivial.

In the meantime, here's one approach that will work in most cases:

class DateTimeWithUsecsField(forms.DateTimeField):
    def clean(self, value):
        if value and '.' in value: 
            value, usecs = value.rsplit('.', 1) # rsplit in case '.' is used elsewhere
            usecs += '0'*(6-len(usecs)) # right pad with zeros if necessary
            try:
                usecs = int(usecs) 
            except ValueError: 
                raise ValidationError('Microseconds must be an integer') 
        else: 
            usecs = 0 
        cleaned_value = super(DateTimeWithUsecsField, self).clean(value)
        if cleaned_value:
            cleaned_value = cleaned_value.replace(microsecond=usecs)
        return cleaned_value

To use this in a model form, you can override the field like so:

class MyForm(forms.ModelForm):
    def __init__(self, *arg, **kwargs):
        super(MyForm, self).__init__(*arg, **kwargs)
        self.fields['date'] = DateTimeWithUsecsField()

Seamlessly switch off (and on) a Django (or other WSGI) site for upgrades

May 25 2009 by Tobias McNulty

In preparation for migrating the EveryWatt database from one machine to another, I wrote this little WSGI script to easily disable the site while I copy the data. Since it doesn't depend on Django or really anything else (other than a functioning WSGI server), you can use it for other upgrades, too.

This is useful for preventing updates to the database while you, for example, dump the database on one machine and load it on another. With everything else already in place on either side, the user should only see the "Upgrade in progress" message for a few minutes.

Since EveryWatt includes a number of data logger clients that upload utility meter readings to the site through its Open API, I wanted to make sure any POST attempts received a temporary failure message (the data logger will store the data and retry the POST every minute)--hence the 405 Method Not Allowed for all non-GET requests.

Here's the script:

import os
import sys

UPGRADING = False

#Calculate the project path based on the location of the WSGI script.
project_dir = os.path.dirname(__file__)
sys.path.append(project_dir)

def upgrade_in_progress(environ, start_response):
    upgrade_file = os.path.join(project_dir, 'media', 'html', 'upgrade.html')
    if os.path.exists(upgrade_file):
        response_headers = [('Content-type','text/html')]
        response = open(upgrade_file).read()
    else:
        response_headers = [('Content-type','text/plain')]
        response = 'Application upgrade in progress...please check back soon.'
    
    if environ['REQUEST_METHOD'] == 'GET':
        status = '503 Service Unavailable'
    else:
        status = '405 Method Not Allowed'
    start_response(status, response_headers)
    return [response]

if UPGRADING:
    application = upgrade_in_progress
else:
    os.environ['DJANGO_SETTINGS_MODULE'] = 'settings'
    import django.core.handlers.wsgi
    application = django.core.handlers.wsgi.WSGIHandler()

And in case you need it, here's one way to dump a PostgreSQL database on one machine while you load it on another, to be run on the new host, as the database superuser:

pg_dump -h  -U   | psql 

Good luck and please post your questions/comments.

Eclipse Ganymede and Subclipse on Ubuntu - JavaHL (JNI) not available

May 21 2009 by Tobias McNulty

I finally got around to updating my Eclipse, PyDev, and Subclipse environment today, which I use for Django development.

Formerly I was using the SvnKit (pure-Java) libraries. SvnKit "felt" slow to me, compared to my command line SVN client, so this time I tried to get the JavaHL (JNI) libraries working.

For the record I'm using Ubuntu (jaunty) with Eclipse 3.4 (Ganymede). This version of Ubuntu comes with Subversion 1.5, so I need to install Subclipse 1.4. See:

http://subclipse.tigris.org/servlets/ProjectProcess?pageID=p4wYuA

I installed everything through the Eclipse update manager (minus SvnKit), but JavaHL didn't show up under Preferences -> Team -> SVN. The error message was. JavaHL (JNI) not available.

I had installed Eclipse manually (not through apt-get), so the solution was to install the JavaHL libraries:

apt-get install libsvn-java

and add the following line to my eclipse.ini (usually in the top level eclipse directory):

-Djava.library.path=/usr/lib/jni

Restart Eclipse, and you should be good to go!

Migrating from django-photologue 1.x to 2.x

March 27 2009 by Tobias McNulty

We're in the process of updating a web app for a client that was built last year about this time using Django and Photologue. Needless to say, there have been a lot of changes to both over the past year!

We were somewhat dismayed to find no easy upgrade path for photologue, and there are a number of model changes that mean you can't just run svn up and be done with it. Using the JSON output from ./manage.py dumpdata, we created a little Python script that handles the database migrations for three of the photologue models (Gallery, Photo, and PhotoSize). Save this in a script called migrate-photologue.py:

#!/usr/bin/python

import sys
import simplejson

if len(sys.argv) != 2:
    print 'Usage: %s ' % sys.argv[0]
    sys.exit(1)

REMOVE_COLUMNS = {
    'photologue.photo': (
        'photographer',
        'info',
    ),
}

RENAME_COLUMNS = {
    'photologue.photo': {
        'pub_date': 'date_added',
        'slug': 'title_slug',
    },
    'photologue.gallery': {
        'pub_date': 'date_added',
        'slug': 'title_slug',
    },
}

ADD_COLUMNS = {
    'photologue.photo': {
        'view_count': 0,
    },
    'photologue.photosize': {
        'upscale': False,
        'increment_count': 0,
    },
}

data = simplejson.load(open(sys.argv[1]))

for obj in data:
    fields = obj['fields']
    model = obj['model']
    for col in REMOVE_COLUMNS.get(model, []):
        if col in fields:
            fields.pop(col)
    for old_name, new_name in RENAME_COLUMNS.get(model, {}).iteritems():
        if old_name in fields:
            fields[new_name] = fields[old_name]
            fields.pop(old_name)
    for col, default_value in ADD_COLUMNS.get(model, {}).iteritems():
        fields[col] = default_value

print simplejson.dumps(data, indent=4)

The script is fairly simple, but back up your database first, just in case. If you need support for additional models, just add the changes you need to the dicts at the top of the file.

During the upgrade, it might help to have two copies of the database running on the local machine, so you can switch back and forth between them at will. A typical migration might look like this:

./manage.py dumpdata photologue &gt; photologue.json
./migrate-photologue.py photologue.json &gt; photologue2.json
./manage.py sqlclear photologue | ./manage.py dbshell
svn up photologue # or however you do it
./manage.py syncdb
./manage.py loaddata photologue2.json
./manage.py sqlsequencereset photologue | ./manage.py dbshell # just in case

Of course, things will get more complicated if you have other models with foreign keys to any of the photologue models. You'll have to drop the constraints temporarily and then add them again after you finish the migration, or take the plunge and write the SQL to do the migration while keeping your database relationships intact.

Overriding Django admin templates for fun and profit

January 20 2009 by Alex Lemann

Motivation & Goal

I sometimes find the admin interface's lists of instances of models overwhelming. The filters and searching helps, but it would be nice to get an overview of the data that is being shown. Particularly I wanted to generate a graph based on the filters selected by the user, so that only items displayed after a filter would be graphed.

For example, if you have a Post model in your Blog application and a filter by author, this code might graph the number of Posts per day of the week to get a sense of when to release your next Post. But, if you select an author from the filter you might want to just graph the number of Posts created only by that one author on each day of the week.

Django has amazing documentation. There is an example in the docs describing how to override the change_form.html template, but no example overriding the change_list.html file. The change_list.html template is the template that describes the list of a particular model's objects in the admin interface. And, there is especially no example that uses the selected filters to change the content of the list of objects in the admin interface.

Process

Following the documentation above, I overrode the templates/admin/my_app/my_model/change_list.html file. This means that my changes to the templates will only show up on that particular object's change_list page. We want to show the graphs above the list of objects only for this one model. Examining the original change_list.html file, it turns out that the header for this list of objects corresponds to the pretitle block. The fact that it is so simple to override a single block in one template for one model in the admin interface speaks to how modular Django is. What do we want to put in that block? The filter data is buried deep within the context passed in to this page from a view. We certainly don't want to mess with the admin's views. That leaves template tags. Now, template tags are usually a last resort for me because of the unwieldy argument passing syntax. Caktus has our own internal way of doing this which is extremely similar to a django snippet posted recently. So, the template should look like:

{% extends "admin/change_list.html" %}
{% load graphs_from_filters %}

{% block pretitle %}
{% graphs_from_filter change_list=cl %}
{% endblock %}

Notice, we are passing the ChangeList variable, cl, from the context into our template tag.

Digging into a ChangeList

So, we've been passed django.contrib.admin.main.views.ChangeList object. What the heck is that? Well change_list objects hold django.contrib.admin.filter_spec.FilterSpecs objects which give us the name of the model, the class, and the id of the particular object. We use that to create a dictionary of filter name to filtered by objects. It looks something like this:

from django import template
from caktus.django.templatetags import parse_args_kwargs, CaktNode
import project.graphs as graphs

class GraphFilterNode(CaktNode):
    def render_with_args(self, context, change_list):
        selected = {}
        for filter_spec in change_list.filter_specs:
          if filter_spec.lookup_val:
            selected[filter_spec.title()] = filter_spec.field.rel.to.objects.get(id=filter_spec.lookup_val)
    return "<img src="%s" />" % graphs.graph_url_from_filters(selected)

register = template.Library()
@register.tag
def graphs_from_filter(parser, token):
  """
    Usage {% graphs_from_filter change_list=<change_list_object> %}
    """
    tag_name, args, kwargs = parse_args_kwargs(parser, token)
    return GraphFilterNode(*args, **kwargs)</change_list_object>

Disclaimer

I know very little about Django internals. This was mostly worked out through ipython, introspection, and reading some code. There is a little bit of validation of this method in Django ticket #3096, but as usual the internal Django structures might change and break your code. This happend to me when this ticket got resolved. I think that now I have this, a better solution, but is it the best one?

Why Caktus Uses Django

January 13 2009 by Tobias McNulty

Here at Caktus, we use the popular Django web framework for a lot of our custom web application development. We don't use Django simply because it's popular, easy to learn, or happened to be the first thing we found. We've written web apps in PHP, Java, and Ruby on Rails--all before we discovered Django--but were never quite satisfied. Following are just a few of the reasons that we both enjoy working with Django and believe it gives you (the client) the best end-product.

Django is Business-Friendly

Django is open source, free, and published under a "do anything you like" license, so it can be used to create all kinds of products, including proprietary business web apps. In addition to a flexible license, Django has a truly thriving user community and is being constantly improved by web developers like ourselves across the globe.

Built-in Admin Interface

Web application development often starts with the "data model." A data model defines the ways in which all the different pieces of information--such as customer names and addresses or product descriptions--are organized and related to each other in the database. Finding the right data model takes time and it's important to get it right, because a lot of development decisions will be based on the way your information is organized and accessible. When you're building a web application from the ground up--something we do every day at Caktus--you want the flexibility to experiment with your data model and "see" what all the different options look like.

This is where Django's built-in admin interface comes in. From the beginning, Django has included an automatically generated interface that lets you see and edit what's in your database. It knows the structure of your data and puts together a set of search and listing pages and custom web forms for creating, modifying, and otherwise managing your data. It lets you evaluate your data model up front before making a big investment in other parts of your web app. For some sites, the admin interface even makes up a big part of the final product (e.g., for sites that primarily publish content, such as news organizations). And, we've found, the automatically generated admin interface is a powerful tool for showing potential clients what a web app can do.

I Trust Django With My Data

At Caktus we put a strong emphasis on "data integrity." What is data integrity? Kevin already wrote a great post about what it is and why you should care about data integrity. In a nutshell, the "integrity" of data refers to its "completeness" or validity as a whole. For example, you probably want to limit the products that people can order on your web site to those that you actually stock in the warehouse. Modern "relational database management systems" provide integrity "checks" for your data that verify its appropriateness--based on the conditions you supply--for a given table in the database. When you build a data model in Django, you specify the nature or "type" of each column in your database and can even specify "constraints" on the data that--if your database server supports it--will be enforced at the database level in addition to the application. While this is always a good thing, it's even more important if other programs or users will be connecting to your database in addition to your web app. While Django does this out of the box, another popular web framework requires some under the hood "hacking" to achieve the same peace of mind about your data. On a side note, in addition to preferring Django for web app development, Caktus also prefers PostgreSQL for data storage. Our friends over at Summersault have already written a good summary describing why PostgreSQL is often the best choice for web app development, so I won't repeat the reasons here. We trust the Django + PostgreSQL combination so much that we even wrote our own CRM and bookkeeping package to keep track of our clients, projects, and all the related financial transactions.

Django is Written in Python

Python is a great language with no shortage of facilities and a huge (and growing) user base. A lot of Google's infrastructure is written in Python, and it is the only language supported by the initial release of their App Engine service. According to python.org:

[Python] offers strong support for integration with other languages and tools, comes with extensive standard libraries, and can be learned in a few days. Many Python programmers report substantial productivity gains and feel the language encourages the development of higher quality, more maintainable code.

Based on Caktus' experience writing Django web apps over the past 1.5+ years, this couldn't be more true.

Separation of Application Components

Django uses a variation of the Model View Controller (MVC) architecture that ensures all the different pieces of your application end up in the right place and, for larger projects, let the people with different skills work on the things they do best, without getting in each other's way. Moreover, Django implements its own very simple "template language" for generating web pages. While some may view its simplicity as a curse, it is actually a blessing in disguise: by allowing only very simple constructs in the template, Django forces you to keep your business logic in the controller (what Django calls a "view") where it belongs. At Caktus, we're not just web developers. We're web engineers with a passion for web apps that not only work, feel, and look great, but also have the capacity to grow, improve, and continue to perform long into the future without breaking the bank. That said, we're truly thrilled about the Python/Django + PostgreSQL combination.

minibooks: Small Business Bookkeeping

January 07 2009 by Colin Copeland

Caktus released minibooks (open-sourced under the AGPL) as a bookkeeping package for small tech agencies. Boasting a double-entry accounting system, customer relationship management (CRM) and transaction reconciliation, minibooks provides a clean, multiuser web-based interface to manage simple accounting needs for small businesses.

minibooks was originally developed out of our frustration with single-user, desktop-oriented accounting packages like QuickBooks. We wanted a team-accessible system, so everyone on our VPN could access the CRM and manage bookkeeping tasks remotely. So Caktus developed a lightweight web app using Django and PostgreSQL to handle our basic needs. We use it everyday at Caktus, so we're continually improving it and adding features (most recently recurring transactions and flags to monitor delivered and undelivered exchanges) to make things easier.

I'd like to spend some time highlighting a few of the many great features found in minibooks:

ExchangeType Model

Invoices, Receipts, Orders, Purchases, etc., are represented by a single, generalized Exchange model in minibooks. At a very low level, they all do the same thing (record an exchange between two entities), but vary slightly in their characteristics. So, they all reside in the same database table and are distinguished by their type. Exchanges have a foreign key relationship to the ExchangeType model.

ExchangeTypes provide a powerful interface to streamline repetitive tasks and customize interface elements based sets of characteristics. For example, if most purchases are made through your credit card, you can create a Purchase ExchangeType to credit your Credit Card Account by default while still allowing you to debit each item on the receipt to a separate account (e.g., Food Expenses, Office Supplies, etc.). Invoices can also use this system with accounts receivable (accrual accounting). Simply set up an Invoice ExchangeType with Accounts Receivable as the common account and separate income credit accounts (e.g. Consulting, Hosting, etc) will be available for each item on the invoice.

Quick Search

minibooks' CRM stores all of the contacts, businesses and projects associated with Caktus. This way everything is consolidated in one spot ---from a client phone number to a project invoice. Being able to jump quickly is a necessity, so we created an AJAX auto-complete search bar that's displayed at the top of every page. Need to enter a receipt from the client meeting at the coffee shop or quickly find a phone number? Just type the first few letters and minibooks will search project, business and customer names and emails and return a list of matches instantly. Then just arrow down and hit return or click the one you're looking for. Better yet, the Quick Search field is accessible through the "f" access key, so hit Control+Alt+F on a Mac or Shift+Alt+F on Linux (and Windows?) and you'll never have to leave the keyboard!

Transaction Reconciliation

Balancing your checkbook can be a slow and tedious process, so we tried to ease the process with your business accounts. When a monthly credit card statement arrives, jump over to the Accounts tab and go to the credit card ledger. Here you'll find all credits (purchases) and debits (payments) ordered by date alongside a running total. Once you OK the amount on the credit card statement, check the checkbox next to that transaction and, using AJAX, minibooks will update the current reconciled balance! Now you can make sure your bookkeeping records always match your bank and credit card statements with ease.

Recurring Exchanges

Caktus provides web and email hosting services for our clients, so we wanted an easy way to automatically generate invoices for recurring services. The exchange creation form has the option to repeat items based on a set interval (days, weeks, months and years). Just setup cron to access /ledger/cron/ every night and all of your recurring exchanges will be automatically generated. An itemized email of generated invoices and other exchanges will be sent to you as well.

LaTeX Exchange and Report Generation

minibooks uses LaTeX to generate PDFs for exchanges (Invoices, Receipts, Member Distributions, etc.) and project reports. LaTeX files fit nicely into the Django template system, so you can use django variables, tags and filters right in your .tex document. minibooks includes the template we use here at Caktus by default for you to use, but the possibilities are endless, so feel free to create your own and style your invoices anyway you like! Further, the CRM automatically attaches the PDFs when sending exchanges to clients through minibooks.

Learn More

These are just some of the features found in minibooks. The source code and installation instructions can be found on the minibooks project page. An online demo (login:demo@minibooksdemo.com, pass:demo) is available for testing as well. minibooks is far from feature complete, so feel free to hack away at it and add features as you see fit!

Caktus' New Web Presence

October 27 2008 by Tobias McNulty

In honor of our recent one year anniversary (August 31st), we revamped our web presence to address a couple concerns about the original site:

  • We wanted to limit the technical language on the site. Our original "Services" page displayed a lot of buzz words and someone reasonably knowledgeable about the technology could easily get a sense of what we do, but the page didn't do as good a job of explaining what we do (and why you should hire us to do it) for individuals and organizations in need of a web application without a dedicated technical team to rely on. The new services page uses a few buzz words, but focuses mostly on the tangible, value-oriented features of what we do.
  • While Caktus started out as a general-purpose technology consulting company, we've honed and refined our skills over the past year+ to focus on what we do best: highly customized web application development, using Django. The old "Services" page detailed (almost) everything that we do and didn't convey a clear sense of what we do best. By simplifying "What We Do" into a few core services and explaining in better detail why we're so great to work with, we believe the new page is both simpler and more compelling.
  • We wanted to make it easier to feature our clients on the new site, so we added a couple tables to our database backend that store some of our customer testimonials and recent projects. Leveraging Django's built-in admin interface, we got this off the ground in almost no time. Using jQuery's Galleria, we created a new portfolio with a little flourish.
  • Lastly, we wanted to simplify and clarify the site's design itself. We removed a few unnecessary borders and colors around the site's navigation, simplified the logo, and made all the text on the site just a little larger.

Take a peek and let us know what you think: www.caktusgroup.com

Asterisk CDR & Django integration with ODBC

October 13 2008 by Alex Lemann

Tobias already mentioned how Caktus uses Asterisk as our PBX. He also mentioned how we tested various frontends both for managing the asterisk configuration and interacting with asterisk to, for example, check our voicemail. We were inticed by some of the client management solutions that we could plumb up with asterisk. Caktus has a loose administration structure, which allows us to be flexible and not have levels of managers between clients and coders. But, this flexiblility can leave loose ends unchecked when the person in charge of a project is distracted for a day or two. We saw this as an opportunity where Caktus could add a level of group accountability and use some neat technology. We also wanted a tool that would integrate well with our current homegrown DjangoERP/CRM as well as Trac, our prefered tool for project management. So, we decided to write some sweet code.

Asterisk provides Call Detail Records (CDR) information for billing calls. This is used for people reselling their asterisk setup on a per call basis which is not what we're doing, but it automatically records a lot of useful information about calls including who called whom when and how long the call lasted. We decided to tie into this information for our interface. Asterisk provides a CDR ODBC interface. ODBC is a generic interface which sits between applications wanting to use a database and the database server itself. This was useful since a lot of the built in features of asterisk rely on MySQL databases which we don't condone the use of. Instead, we used these instructions to setup our Asterisk ODBC interface using unixODBC in order to connect to our Postgres backend. This setup took a while to get all the pieces in place working together. It will take a lot of fiddling to get this working.

First, add an ODBC driver for your database. This is an example for using Postgres since that's our preference.

/etc/odbc-pgsql.ini:

[PostgreSQL]
Description = PostgreSQL driver for Linux &amp; Win32
Driver = /usr/lib/odbc/psqlodbca.so
Setup = /usr/lib/odbc/libodbcpsqlS.so
FileUsage = 1

Add an ODBC interface for the database of the Django project where you want the CDR data to show up. Fill in the blanks with the correct information for your configuration.

/etc/odbc.ini:

[django_odbc]
Description = PostgreSQL Asterisk
Driver = PostgreSQL
Servername = localhost
UserName = django_db_user
Password = django_db_password
Database = django_db_name
Port = 5432
Option = 3

Tell asterisk how to connect to the unixODBC server to log CDR data.

/etc/asterisk/cdr_odbc.conf:

[global]
dsn=django_odbc
username=django_db_name
password=django_db_password
loguniqueid=yes
dispositionstring=yes
table=cdr
usegmtime=no

Tell asterisk to use the CDR/ODBC configuration we just configured to store CDR data.

/etc/asterisk/modules.conf:

load =&gt; cdr_odbc.so

Now, the database must be setup as well. Use one of the schemas provided by the asterisk project for the Cdr table in the database. I used this data to create a model in our Django project and imported it into our models.py file using "python ./manage.py inspectdb".

class Cdr(models.Model):
  acctid = models.TextField(primary_key=True)
  calldate = models.DateTimeField()
  clid = models.CharField(max_length=80)
  src = models.CharField(max_length=80)
  dst = models.CharField(max_length=80)
  dcontext = models.CharField(max_length=80)
  channel = models.CharField(max_length=80)
  dstchannel = models.CharField(max_length=80)
  lastapp = models.CharField(max_length=80)

  lastdata = models.CharField(max_length=80)
  duration = models.IntegerField()
  billsec = models.IntegerField()
  disposition = models.CharField(max_length=45)
  amaflags = models.IntegerField()
  accountcode = models.CharField(max_length=20)
  uniqueid = models.CharField(max_length=32)
  userfield = models.CharField(max_length=255)
  class Meta:
    db_table = u'cdr'


  def __str__(self):
    return "%s -&gt; %s" % ( self.src, self.dst )

Storing real-time call information

Since CDR data is only needed for per call billing information, the CDR information is not stored until all the data comes in, after a call is completed. We wanted to be able to creates notes on a call as it was happening, so we created a new table, Interactions. A row in this table will be populated as soon as a call is made or received and it will provide a place for our notes. In order to do this we used Asterisk's ODBC Functions. These allow you to make any SQL call, inserting or selecting data, from within your dialplan.

Create a simple Interaction table in your django project.

django-project/app/models.py

class Interaction(models.Model):
  project = models.ForeignKey(Project, null=True)
  contacts = models.ManyToManyField(User, related_name='interactions')
  memo = models.TextField(null=True)
  cdr = models.ForeignKey('Cdr', null=True, to_field='uniqueid', editable=False)

Now enable a connection for ODBC functions using the unixODBC settings from before.

/etc/asterisk/res_odbc.conf:

[django_odbc]
enabled =&gt; yes
dsn =&gt; django_odbc
pre-connect =&gt; yes

Tell asterisk to use the func_odbc driver.

/etc/asterisk/modules.conf:

preload =&gt; func_odbc.so

Here is the actual SQL statement to be called from within the dialplan. We've added a bit of SQL to update the contact field as well based on who was called or who called us. It's probably too tied to our CRM model to be useful for you, but it's not that difficult to do, if you've made it this far. Here, you should replace app_interaction with the name of your Interaction table. It should not be important for security to escape this value since it's internal to asterisk.

/etc/asterisk/func_odbc.conf:

[LOG_INTERACTION]
dsn=django_odbc
write=INSERT INTO app_interaction (cdr_id,project,contacts,memo) VALUES ('${VAL1}',NULL,NULL,NULL);

A call to this function should be made from your dialplan (extensions.conf). This will create an Interaction and link it to the Cdr row for the call once that is populated using the unique id that asterisk assigns each call.

Incoming

/etc/asterisk/extensions.conf:

exten =&gt; _XX.,1,Set(ODBC_LOG_INTERACTION()=${UNIQUEID})
exten =&gt; _XX.,2,.....

Outgoing

/etc/asterisk/extensions.conf:

exten =&gt; _1NXXNXXXXXX,1,Set(ODBC_LOG_INTERACTION()=${UNIQUEID})
exten =&gt; _1NXXNXXXXXX,2,....

Future plans.

Store our voicemail in a database. Also, we've had the CallerID information displayed on our softphones populated by our CRM information, but this is not currently working. It would definitely worth looking at again. It would help ease the context switch of stopping what we're doing and picking up the phone. It also would let us better determine who the call is bound for and let that person pick up reducing the number of folks a client has to talk to and not make us transfer them around.