Configuring a Jenkins Slave

January 10 2012 by Colin Copeland

We're pretty avid testers here at Caktus and when one of our Django projects required upgrading to Python 2.7, we also needed to upgrade our Jenkins build environment. Luckily, Jenkins supports distributed builds to allow a master install to delegate tasks to slaves instances. This way we can continue to run our primary build system on Ubuntu 10.04, which defaults to Python 2.6, and delegate tasks to an Ubuntu 11.04 environment running Python 2.7. The setup is fairly easy, but since I didn't find much out there already, I figured I write up a quick post outlining what we did.

To start, we'll need a new machine. I setup an Ubuntu 11.04 instance on Linode. Then SSH in, upgrade the packages, and install a Java Runtime Environment:

$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install default-jre

That's the only package Jenkins needs by default. Next we'll setup a user for Jenkins to SSH as. To do this, we'll add a new user to the system and copy the master's SSH public key:

$ sudo useradd -m jenkins
$ sudo -u jenkins mkdir /home/jenkins/.ssh
$ sudo -u jenkins vim /home/jenkins/.ssh/authorized_keys2

Now the master Jenkins client can ssh to the slave without a password. Next we need to configure the Jenkins master to connect to the slave. Head over to the Master environment and navigate to "Manage Jenkins" and then "Manage Nodes". Click "New Node" in the sidebar and add a Dumb Slave. On the following page, fill in the following fields:

  • # of executors: 2 (controls the number of concurrent builds)
  • Remote FS root: /home/jenkins
  • Labels: python27 natty
  • Usage: Leave this machine for tied jobs only
  • Launch method: Launch slave agents on Unix machines via SSH. Also fill in the Host field with the address of your slave machine.

Hit save and your Jenkins master should open a connection to your slave machine. To use the new slave machine, update an existing Jenkins job and set the "Restrict where this project can be run" Label Expression to "python27". You'll need to install any project dependencies on the slave for it to build properly, but that's basically it!

Class-based views in Django 1.3

December 29 2011 by Dan Poirier

Django class-based views


Introduction

Django 1.3 added class-based views, but neglected to provide documentation to explain what they were or how to use them. So here's a basic introduction.


Example of a very basic class-based view

Let's start with an example of a very basic class-based view.

urls.py:

...
url(r'^/$', MyViewClass.as_view(), name='myview'),
...

views.py:

from django.views.generic.base import TemplateView

class MyViewClass(TemplateView):
    template_name = "index.html"

    def get(self, request, *args, **kwargs):
        context = # compute what you want to pass to the template
        return self.render_to_response(context)

This will render your template index.html with the context you computed and return it as the content of an HttpResponse.


Introduction to class-based views

Now that we've seen the obligatory example, how about some instructions?

  • To create a class-based view, start by creating a class that inherits from django.views.generic.View or one of its subclasses.

  • In your URLconf, specify the view method as the name of the new class, plus .as_view():

    url(r'urlpattern', MyViewClass.as_view(), ...)

  • In your class, write a get method that takes as arguments self (as always), request (the HttpRequest), and any other arguments from the request as specified in your URLconf.

  • In your get method, use the same logic you'd have used in an old view, except that you can assume the request method is GET. Return an HttpResponse as usual.

  • If you need to handle POST, write a post method, just like your get method except that you can assume the request method is POST.

  • Any request method that you don't write a handler method for will automatically get back a "method not allowed" response; you don't have to do anything special.

Example:

from django.views.generic import View
from django.shortcuts import render

class MyViewClass(View):
    def get(self, request, arg1, keyword=value):
        return do_something()
    def post(self, request, arg1, keyword=value):
        return do_something_else()

Handy subclasses of View

Django comes with a number of useful subclasses of View that provide some of the function that often ends up as boilerplate in views, just by inheriting from them. You saw TemplateView being used already. You'll probably want to base your views on TemplateView almost anytime you're generating the content for a response.

Another useful one is RedirectView. This can be used to redirect all requests. Example:

from django.core.urlresolvers import reverse
from django.views.generic import RedirectView

class MyRedirectView(RedirectView):
    url = reverse(...)

That is a complete view, and will return a redirect to url on any GET, POST, or HEAD request.

You can optionally set permanent = False to return a temporary redirect instead of the default permanent redirect, and query_string = True to include any query string from the incoming request on the redirect URL:

from django.core.urlresolvers import reverse
from django.views.generic import RedirectView

class MyRedirectView(RedirectView):
    url = reverse(...)
    permanent = False
    query_string = True

Decorators

Unfortunately, using decorators with class-based views isn't quite as simple as using them with the old method-based views.

Maybe you're used to doing this:

from django.contrib.auth.decorators import login_required

@login_required
def myview(request):
    context = ...
    return render(request, 'index.html', context)

With class-based views, you have to decorate the .dispatch() method of the class view, which means you have to override it just to decorate it. And you need to decorate the decorator, because the decorators provided by Django expect to be decorating method-based views, not class-based ones:

from django.contrib.auth.decorators import login_required
from django.views.generic.base import View
from django.views.utils.decorators import method_decorator

class MyViewClass(View):

    def get(self, request, **kwargs):
        context = ...
        return render(request, 'index.html', context)

    @method_decorator(login_required)
    def dispatch(self, *args, **kwargs):
        return super(MyViewClass, self).dispatch(*args, **kwargs)

This is an area of class-based views that could use some improvement.

You could apply the decorator in urls.py without needing so much extra code:

urls.py:

from django.contrib.auth.decorators import login_required
...
    url(r'^/$', login_required(MyViewClass.as_view()), name='myview'),
...

but that moves the policy from the view code to the URLconf, which is not where people will be expecting to have to look for it, so I wouldn't recommend it.


Passing arguments to the view

The method signature for get(), post(), etc. in a view class is:

def get(self, request, *args, **kwargs)

Any unnamed values captured in the URLconf regular expression are passed in args, and any named values are passed in kwargs, just like before.

You can pass extra arguments to your view using the third element of your URLconf, the same as before, or using a new technique -- passing them to the .as_view() call in your url settings. E.g.

...
    url(r'^/$', MyViewClass.as_view(extra_arg=3), name='myview'),
...

One warning - don't accidently write MyViewClass(extra_arg=3).as_view(). That'll still appear to work, but that extra_arg is just thrown away.


Where's the beef?

So far, all we've done is the same behavior, written using a different syntax. But class-based views enable a whole new level of function.

Suppose you've got a view that displays some data on a web page, and you write it as a class-based view. Maybe something like this:

from django.views.generic.base import TemplateView

class MyViewClass(TemplateView):
    template_name = 'index.html'

    def get(self, request, **kwargs):
        # Lots of complex logic in here to compute 'context'
        self.render_to_response(context)

Now you're asked to provide an HTTP API that returns the same data in json.

Start by refactoring your existing class slightly, moving your business logic out of the get() method:

from django.views.generic.base import TemplateView

class MyViewClass(TemplateView):
    template_name = 'index.html'

    def compute_context(self, request, **kwargs):
        # Lots of complex logic in here to compute 'context'
        return context

    def get(self, request, **kwargs):
        self.render_to_response(self.compute_context(request, kwargs))

Now, write a new class that subclasses your original class, uses the same method to compute the data, but overrides get() with different rendering code:

class MyJsonViewClass(MyViewClass):
    def get(self, request, **kwargs):
        data = self.compute_context(request, **kwargs)
        # Very naive way to put your data into json, but a good starting place
        content = json.dumps(data)
        return HttpResponse(content, content_type='application/json')

Add a new URL to urls.py pointing to your new class-based view, and you're done. All the logic you worked out earlier is still in use, and the power of subclassing let you provide the data in a new format almost effortlessly.


Class-based views for common policy

The previous example was still something you could have done almost as easily with method-based views, by refactoring your code into separate methods and calling them from all your views.

A more powerful use of the new class-based views is to provide common function for many views. If you have a site with many views, and they all inherit from a common view, then you have the potential to change behavior across the site by changing that one view.

Previously, you would probably have used middleware for this kind of thing. The problem with middleware is that it's completely hidden from the view code. When working on your view, you won't even know middleware is affecting things unless you go look at the settings and track down each piece of middleware configured there.

Furthermore, middleware affects every request, not just the views you really wanted it for.

With a common class-based view, every view affected is declared to inherit from that view, making it obvious that we're inheriting behavior from elsewhere. With a good IDE, you can even jump straight to that superclass to inspect it. Any view that doesn't need the common behavior doesn't have to inherit it.


References

The only documentation page that really discussed class-based views in Django 1.3 is this one:

https://docs.djangoproject.com/en/1.3/topics/class-based-views/

Some of the rationale for the current design of class-based views, and pros and cons of some alternatives that were considered, are documented here:

https://code.djangoproject.com/wiki/ClassBasedViews

Beyond that, the best advice I can give is to go read the code. The code for the base View is surprisingly small, and can be found at django/views/generic/base.py.

OpenBlock Geocoder, Part 3: External Geocoders

December 28 2011 by Colin Copeland

The OpenBlock geocoder is powerful and robust. It uses PostGIS for spacial queries, can extract addresses from bodies of text, and can understand block and intersection notation. We've run into a few issues with it, however, including a low geocoding success rate. This is a tough problem to solve and depends on a lot of factors (the extent of street and block data in OpenBlock, format of the street addresses, etc.), so your mileage may vary. Below I constructed a simple test using Google's Geocoding API to have as an alternative.

Disclamer: This is the third post in our OpenRural series reviewing OpenBlock and it's geocoder. You may wish to read Part 1: Data Model and Geocoding and Part 2: Text Parsing and Entity Extraction before proceeding.

Adding news with OpenBlock's geocoder

The Schema and NewsItem models provide OpenBlock with a generic data model to associate news with geographic locations. You can find a fairly extensive introduction in the official documentation, so we won't go into too much detail here.

Since a NewsItem requires a geographic point, let's use the OpenBlock geocoder to fine 123 East Franklin Street:

>>> from ebpub.geocoder import SmartGeocoder
>>> geocoder = SmartGeocoder()
>>> location_name = '123 East Franklin Street'
>>> point = geocoder.geocode(location_name)['point']
>>> point.wkt
'POINT (-79.0553588124999891 35.9133110937499964)'
You'll notice that point has a wkt attribute. wkt, or Well-known text, is a text markup language for representing geometry objects. Here we have a POINT, but the language can represent many geometries, including LineString and Polygons.

We'll use the "Local News" schema in this example as it is pre-loaded in OpenBlock:

>>> from ebpub.db import models as ebpub
>>> schema = ebpub.Schema.objects.get(name='Local News')

Using this schema, we'll add a new NewsItem with the point created above:

>>> import datetime
>>> news = schema.newsitem_set.create(
...     title='Incident downtown',
...     description='Something happend downtown today!',
...     item_date=datetime.date.today(),
...     location=point,
...     location_name=location_name,
... )
>>> news.location.wkt
'POINT (-79.0553588124999891 35.9133110937499964)'

That was easy. Now we have a NewsItem that OpenBlock is aware of and can be plotted on a map. However, what do we do if we can't geocode the address?

Using an External Geocoder

If we already have a geographic point, then we can circumvent the geocoder entirely:

>>> from django.contrib.gis.geos import Point
>>> manual_point = Point(-79.0553588124999891, 35.9133110937499964)
>>> news = schema.newsitem_set.create(
...     title='Incident downtown',
...     description='Something happend downtown today!',
...     item_date=datetime.date.today(),
...     location=manual_point,
...     location_name=location_name,
... )
>>> news.location.wkt
'POINT (-79.0553588124999891 35.9133110937499964)'

This means we can also use an external geocoder. For example, we can use Google's Geocoding API with geopy. First, you'll need a Google Maps API key, which we'll use with geopy:

>>> GOOGLE_MAPS_API_KEY = '' # your Google Maps API key

Then we can use geopy to construct a new geocoder:

>>> from geopy import geocoders
>>> g = geocoders.Google(GOOGLE_MAPS_API_KEY)

And we can geocode our address:

>>> address = '123 East Franklin Street, Chapel Hill, NC'
>>> place, (lat, lng) = g.geocode(address)
>>> point = Point(lng, lat)
>>> point.wkt
'POINT (-79.0549350000000004 35.9136495999999994)'

You can even tap into OpenBlock's internals and build a Geocoder that OpenBlock can use:

from django.conf import settings
from django.contrib.gis.geos import Point

from geopy import geocoders
from geopy.geocoders.google import GQueryError

from ebpub.geocoder import Geocoder, DoesNotExist


class GoogleGeocoder(Geocoder):

    def __init__(self, *args, **kwargs):
        kwargs['use_cache'] = False # haven't implemented cache yet
        super(GoogleGeocoder, self).__init__(*args, **kwargs)
        self.geocoder = geocoders.Google(settings.GOOGLE_MAPS_API_KEY)

    def _do_geocode(self, location_string):
        try:
            place, (lat, lng) = self.geocoder.geocode(location_string)
        except (GQueryError, ValueError), e:
            raise DoesNotExist(unicode(e))
        location = {'point': Point(lng, lat)}
        return location

This is an proof-of-concept geocoder we're using with OpenRural. You can find it on GitHub. Using this geocoder with a sample dataset from the North Carolina Secretary of State Corporation Filings, I was able to increase the geocoding success rate from about 37% to 95%. Again, your mileage will vary, but it can be useful to test out. We can't use Google's API for everything though. Normal users are limited to 2,500 requests per day. Business accounts are allotted 100,000 requests. Additionally, Google requires you to display any points geocoded with their API on a Google Map. So you'll need to evaluate your needs before deciding on using Google's API.

Using Django and Celery with Amazon SQS

December 19 2011 by Tobias McNulty

Amazon's Simple Queue Service (SQS) is a relatively new offering in the family of Amazon Web Services (AWS). It's also an appealing one, because it proposes to quickly and easily replace a common component of the stack in a typical web application, thereby obviating the need to run a separate queue server like RabbitMQ. While RabbitMQ — the typical favorite for Celery users — is not necessarily difficult to install or maintain, removing it from the stack of a web application means one less component that might fail, offloading that service to AWS — especially for applications with a small to moderate queue volume — might prove financially advantageous.

While it's quite easy to use Celery with Amazon's Simple Queue Service (SQS), there's currently not a lot of information out there about how to do it. There's this post on the celery-users list that didn't leave me with much hope, and this question on StackOverflow that sounded slightly more promising. I still couldn't find a step-by-step how to, however, and it ended up being quite easy, so here's my take:

  1. Upgrade to the latest versions of kombu, celery, and django-celery. At the time of this writing, those versions are 1.5.1, 2.4.5, and 2.4.2.:

    pip install kombu==1.5.1
    pip install celery==2.4.5
    pip install django-celery==2.4.2
    
  2. Add the following lines to settings.py (or local_settings.py depending on your setup):

    BROKER_TRANSPORT = 'sqs'
    BROKER_TRANSPORT_OPTIONS = {
        'region': 'us-east-1',
    }
    BROKER_USER = AWS_ACCESS_KEY_ID
    BROKER_PASSWORD = AWS_SECRET_ACCESS_KEY
    

    In the above, AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY should point to the appropriate AWS access key and secret for account you want to use. Pro tip: Use AWS's Identity and Access Management (IAM) to setup an API key and secret that only has access to the services your web application will use (typically one or more of SQS, SES, and SimpleDB).

  3. Finally, if you'll be running multiple servers or environments on the same AWS account (e.g., two different web apps or staging and production environments of the same app), you may want to customize the SQS queue name being used (the default is "celery"). To make this change, add the following lines to your settings.py (or again, local_settings.py):

    CELERY_DEFAULT_QUEUE = 'celery-myapp-production'
    CELERY_QUEUES = {
        CELERY_DEFAULT_QUEUE: {
            'exchange': CELERY_DEFAULT_QUEUE,
            'binding_key': CELERY_DEFAULT_QUEUE,
        }
    }
    

For the curious, Celery's support for SQS lies in the underlying Kombu library, the latest version of which includes a transport for SQS. While some points I found (including the StackOverflow post) suggest using the BROKER_URL syntax for pointing to AWS, I found it simpler to use the BROKER_USER and BROKER_PASSWORD variables. I also saw some reports that slashes in your API secret could confuse the underlying URL parser, and since my API secret happened to include a number of slashes, I went straight to using BROKER_USER and BROKER_PASSWORD.

Anyways, I hope this helps someone else looking to solve the same problem, and don't hesitate to comment if you run into any issues or have a better way to go about this!

OpenBlock Geocoder, Part 2: Text Parsing and Entity Extraction

December 16 2011 by Colin Copeland

This is the second post in our OpenRural series reviewing OpenBlock and it's geocoder. OpenBlock Geocoder, Part 1: Data Model and Geocoding covers the internals of the OpenBlock geocoder and it's geocoding capabilities. As this posts builds upon topics covered there, you may wish to read Part 1 before proceeding. In this post we step back from the internals of the geocoder and explore how to use it along with other OpenBlock tools to parse unstructured text.

I'd also like to give a shout out here to Paul Winkler who was kind enough to answer questions and point me in the right direction on the topics below. Thanks Paul!

The Problem

OpenBlock's original design is centered around providing news at a hyper-local level. That is, down to your own city block. This allows interested citizens to see events ranging from police incidents, to restaurant inspections, to local news articles all aggregated on a map of your block. OpenBlock provides scraping tools to assist downloading this data from the web, but the obvious problem here is that most data isn't packaged or tagged with geographic information. Let's look at an example article teaser from The Daily Tar Heel in Chapel Hill, NC:

No. 4 North Carolina led Evansville 63-27 with just more than 14 minutes to go in the first half when senior forward Tyler Zeller scored his 999th career point at the Smith Center on Tuesday night.

The article mentions the game at the Smith Center, which is the location we want to extract and plot on a map. This is where OpenBlock utilities to ingest unstructured text helps.

Places

Places are simple models containing only a name and geographic point. OpenBlock implements a mechanism to find places defined in the database from a body of text. For example, say we have the following string we'd like to parse:

>>> message = 'A good movie is playing at the Varsity Theater in Chapel Hill tonight.'

OpenBlock can extract "Varsity Theater" if we define it as a Place. You can create and import places in the OpenBlock admin, but to keep things simple, we'll just create one here:

Here we created a new Point of Interest place (which is loaded by default on any OpenBlock install) geocoded to 123 East Franklin Street. Now we need a way to parse places from strings. Most of this functionality is found in ebdata. And ebdata contains a Natural Language Processing package, nlp. We can use it's place_grabber to extract matching places:

We can feed this right back into the Place model to retrieve the database objects and their geographic locations:

The parser is case sensitive however, so it'll fail if it's not an exact match:

>>> grabber("VARSITY THEATER")
[]

Obviously this is a brute-force method and requires you to pre-load all places of interest into the database beforehand. It's pretty rudimentary, but does provide this functionality out-of-the-box.

Locations

OpenBlock can also extract locations defined in the database. We already have cities loaded, so we'll use them in this example. Just like the place grabber, the location grabber is case sensitive, so we'll define a location synonym with the proper case:

>>> from ebpub.db.models import Location, LocationSynonym
>>> ch = Location.objects.get(name='CHAPEL HILL')
>>> LocationSynonym(pretty_name='Chapel Hill', location=ch).save()

By default, the location grabber igonores types of "city" and "borough". To keep things simple, we'll just create one that includes all location types:

>>> grabber = places.location_grabber(ignore_location_types=[])

Now we can use the grabber to extract locations:

>>> grabber(message)
[(50, 61, 'Chapel Hill')]

If you plan to parse a lot of text in succession, the OpenBlock grabbers cache the locations/places on instantiation. So you won't hit the database after the initial run. Cool!

Addresses

ebdata.nlp can also parse addresses. For example, let's use a simple string:

>>> from ebdata.nlp.addresses import parse_addresses
>>> parse_addresses('The Varsity Theater is located at 123 N Franklin St')
[('123 N Franklin St', '')]

Under the hood, OpenBlock uses a large regular expression to do this, so it's not actually hitting the database or attemping to do geocoding. You'll notice that it returns a 2-item tuple. The second item is for the city:

>>> parse_addresses('The individual was seen on 123 N Franklin St in Chapel Hill')
>>> [('123 N Franklin St', 'Chapel Hill')]

It can parse block locations too:

>>> parse_addresses('The construction is on the 100 block of Franklin St.')
[('100 block of Franklin St.', '')]

And intersections:

>>> parse_addresses('The incident occured at the intersection of Franklin and Hillsborough')
[('Franklin and Hillsborough', '')]

It all comes together with the geocoder:

Conclusion

As you can see, OpenBlock provides a few useful utilities to parse unstructured text. They're fairly limited and, especially with the address parser, will most likely return a lot of false positives. But I think OpenBlock has provided a great starting point. Stayed tuned for more posts on inner-workings of the OpenBlock project!

OpenBlock Geocoder, Part 1: Data Model and Geocoding

December 12 2011 by Colin Copeland

As Tobias mentioned in Scraping Data and Web Standards, Caktus is collaborating with the UNC School of Journalism to help develop Open Rural (the code is on GitHub). Open Rural hopes to help rural newspapers in North Carolina leverage OpenBlock. This blog post is the first of several covering the internals of OpenBlock and, specifically, the geocoder.

OpenBlock Data Model

The OpenBlock geocoder can only geocode from the data is has. It doesn't leverage a 3rd-party API or service. It only uses what's loaded in PostgreSQL (with PostGIS and GeoDjango) and, in this example, what comes from the US Census Bureau and local city and county GIS offices.

Further, the imported data is typically filtered by a bounding box setting in METRO_LIST. The setting, extent, is a list of leftmost longitude, lower latitude, rightmost longitude, upper latitude. This defines a bounding box - the range of latitudes and longitudes that are relevant to your area. A small or restrictive box will limit imported ZIP code and block data to areas that fall within the box.

Let's look at an example with these shapefiles:

We'll start with a restrictive extent that only consists of downtown Chapel Hill:

METRO_LIST = (
    {
        # Extent of the region, as a longitude/latitude bounding box.
        'extent': (-79.066272, 35.91671, -79.040481, 35.910663),
        # ...
    },
)

This selection loaded 2 ZIP codes:

$ django-admin.py import_nc_zips
Importing zip codes...
# ...
Skipping 27511, out of bounds
Skipping 27513, out of bounds
Created ZIP Code 27514 
Created ZIP Code 27516 
Skipping 27517, out of bounds
Skipping 27519, out of bounds
# ...
Created 2 zipcodes.

And limited the block data as well:

$ django-admin.py import_county_streets 37135
Importing blocks, this may take several minutes ...
Created 73 blocks
Populating streets and fixing addresses, these can take several minutes...
Populating the streets table
streets: created: 28
block_intersections: created: 160
Done.

Restricting the area will limit the ability of the geocoder. In this case, for example, it can geocode the intersection of Franklin and Henderson, which is right downtown, but not Franklin and Estes (don't worry, we'll get into more geocoding details in the next section). A map helps illustrate this more clearly. Below you can see the bounding box with pins on the two intersections:



View OpenRural - Downtown Chapel Hill in a larger map

If we increase the bounding box, we'll get a lot more data:

METRO_LIST = (
    {
        # Extent of the region, as a longitude/latitude bounding box.
        'extent': (-79.165922, 35.829095, -78.978468, 36.02426),
        # ...
    },
)

With an extent that encompasses all of Chapel Hill, the importer loaded 9 ZIP codes, 4302 blocks, 1699 streets, and 7189 intersections. Here's a map illustrating the larger extent:



View OpenRural - Orange County, NC in a larger map

It's up to the maintainer of an OpenBlock install to determine which extent to use as it is based on the specifics of the application. A large extent will import more ZIP codes and blocks and, therefore, will slow down geospatial queries and may include unwanted geographic areas.

Street

Now that we have NC Orange County data loaded, let's investigate this data with the OpenBlock models.

The Street model contains a catalog of all loaded streets. It's a simple model with only a few fields:

  • street
  • pretty_name
  • street_slug
  • suffix
  • city
  • state

In NC Orange County, we can see that the street data spans 4 cities:

>>> from ebpub.streets.models import Street
>>> Street.objects.order_by('city').values_list('city', flat=True).distinct()
[u'', u'CARRBORO', u'CHAPEL HILL', u'DURHAM', u'HILLSBOROUGH']

Some streets cross city lines and therefore contain two entries:

>>> Street.objects.filter(street_slug='rosemary-st').values_list('city', flat=True)
[u'CARRBORO', u'CHAPEL HILL']

And, for example, if we're looking for Franklin St. in Chapel Hill, NC, we can filter for it here:

Blocks

Blocks are fundamental to OpenBlock and are used by the geocoder. OpenBlock defines a block as "a segment of a single street between one side street and another side street." The Block model is slightly more intricate than Street, but each entry basically represents the address range of a street for each block segment.

To start, we can see that Franklin St. is divided into roughly 32 blocks:

>>> from ebpub.streets.models import Block
>>> Block.objects.filter(street_slug='franklin-st').count()
32

It's sectioned into an east and west segment:

>>> Block.objects.filter(street_slug='franklin-st').order_by('street_pretty_name').values_list('street_pretty_name', 'predir').distinct()
[(u'Franklin St.', u'W'), (u'Franklin St.', u'E')]

And can have an address between 100 and 1899:

>>> Block.objects.filter(street_slug='franklin-st').aggregate(Min('from_num'), Max('to_num'))
{'from_num__min': 100, 'to_num__max': 1899}

So we can find the block that contains the 123 address:

Also, on a side note, it's possible for some blocks to span cities:

Geocoding

Now that we have a basic understanding of how the data is stored within OpenBlock, let's do some geocoding. Most of these examples will use the SmartGeocoder class. SmartGeocoder delegates to specific geocoders (AddressGeocoder, BlockGeocoder, and IntersectionGeocoder) based on how it interprets the string with regular expressions.

Addresses

To start, let's geocode "123 East Franklin Street":

This one was pretty easy for geocoder to parse and find. You can see that not only has it found the associated block, but it also knows the exact geographic point. However, this will fail if passed a non-existent address number (InvalidBlockButValidStreet):

In this case, the geocoder was able to extract the address, but it failed to find the associated block in the database. Non-existent streets also fail (DoesNotExist):

Intersections

The geocoder can locate intersections too:

Notice how the intersection field is populated, rather than block. This will raise a DoesNotExist exception when an intersection is not found:

Street Misspellings

OpenBlock provides a model, StreetMisspelling, to define street aliases. This allows you to map a bad street name to a good street name that exists in the database:

Now geocoding "Glen Haven" will find "Glenhaven".

Multiple Cities

By default, OpenBlock is configured to work with a single city, which is defined in METRO_LIST:

# Metros. You almost certainly only want one dictionary in this list.
# See the configuration docs for more info.
METRO_LIST = (
    {
        # Extent of the region, as a longitude/latitude bounding box.
        'extent': (-79.165922, 35.829095, -78.978468, 36.02426),

        # The major city in the region.
        'city_name': 'Chapel Hill', 
    },
)

The geocoder will fail if it locates a street that's associated with a city unknown to OpenBlock. For example, 100 Pine Street is in Carrboro and not Chapel Hill:

This street exists in the database due to our extent covering most of Orange County. Since we've setup OpenBlock to encompass an entire county, rather than a single city, we need to define additional cities. This can be accomplished one of two ways:

  • Add additional dictionaries to METRO_LIST for each city
  • Import city locations into the database and tell OpenBlock to refer to these

We imported Orange County city boundary data above, so we'll use the latter:

METRO_LIST = (
    {
        # Extent of the region, as a longitude/latitude bounding box.
        'extent': (-79.165922, 35.829095, -78.978468, 36.02426),

        # Set this to True if the region has multiple cities.
        # You will also need to set 'city_location_type'.
        'multiple_cities': True,

        # The major city in the region.
        'city_name': 'Chapel Hill',

        # Slug of an ebpub.db.LocationType that represents cities.
        # Only needed if multiple_cities = True.
        'city_location_type': 'cities',
    },
)

Here we enabled multiple_cities and informed OpenBlock that the location type slug is cities, respectively. Now 100 Pine Street will geocode properly:

What's Next

Now that we've had an overview of the geocoder, we'll jump into OpenBlock's place, location, and address parser. Stay tuned!

Update: Read more in OpenBlock Geocoder, Part 2: Text Parsing and Entity Extraction.

Scraping Data and Web Standards

December 06 2011 by Tobias McNulty

We're currently involved in a project with the UNC School of Journalism that hopes to help rural newspapers in North Carolina leverage OpenBlock.  The project is called OpenRural, and if you're a software developer you can find the latest code on GitHub.

OpenBlock needs geographic data to display, and that data can come from a variety of sources.  We've found a number of web sites that offer geographically interesting data to NC residents, and in this post I'd like to discuss my experience attempting to scrape (that is, programmatically navigate and extract data from) the Chapel Hill Police Department's (CHPD's) online database of crime reports.

The CHPD site advertises itself as powered by "Sungard Public Sector OSSI's P2C engine," and a quick Google for "P2C engine" shows that Chapel Hill is not the only city or county in North Carolina that happens to use this product.  Unfortunately, scraping the data on this site proved to be a non-trivial endeavor.

I opted to host and run my scraper script on ScraperWiki, which is a great tool for writing, testing, and running scraper scripts in a variety of scripting languages.  The site even manifests the scraped data in API form, so it could potentially be used as an abstraction layer between the scraped sites and OpenBlock (or any other consumer of the data).  The current state of the script can be found here:

https://scraperwiki.com/scrapers/chapel_hill_police_reports/

The script uses the Python mechanize library to navigate the site being scraped, and BeautifulSoup to find and extract data on the pages retrieved.  After telling mechanize to click the "I Agree" button on the CHPD web site's landing page, it was easy enough to submit the search form for the current day and return a listing of results.

While getting the initial list of results was fairly trivial, one issue I ran into when writing the scraper is that the site uses an odd method of retrieving and paginating results.  Looking at the HTML source, you will see that the search form is submitted by a small piece of JavaScript, like so:

function __doPostBack(eventTarget, eventArgument) {
    if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
        theForm.__EVENTTARGET.value = eventTarget;
        theForm.__EVENTARGUMENT.value = eventArgument;
        theForm.submit();
    }
}

It turns out this little method is used to do quite a lot.  There are calls to it to do everything from sorting, to pagination, to link to other pages on the site.  It effectively works by setting the form action (via two hidden form inputs on the page) and then calling submit() on the form.

You may have also noticed that the form has method="post", rather than method="get" set, which means the web browser will send an HTTP POST (rather than an HTTP GET) every time you modify the form and click the Search button.  Per the HTTP/1.1 specification, POST requests should be used for requests that modify data on the server, whereis GET requests should be used to retrieve information at a given URL.  You can also tell that the site uses POST instead of GET by inspecting the URL in your browser; sites pages that use GET will typically have a portion of their URL that starts with a question mark and is followed by key/value pairs.  The link to the Google search above is an example of the GET method. Searching a site is by definition a retrieval operation (and typically does not involve modifying data on the server), so well-written search forms should use the GET rather than the POST HTTP method.

Confusing POST and GET is a fairly elementary problem, but it's one that we see far too often on the web.  If you've ever been prompted by your browser "re-submit a form" after hitting the back button and are warned that it may modify data on the server, the site you're using is probably not using the GET and POST HTTP methods properly.

In the case of the CHPD site, while it was easy enough to set the values of the hidden form inputs and re-submit the form using POST (after finding this post on StackOverflow, at least), for some reason the site still returns the first page of results to mechanize (even though it properly paginates in a real web browser). I'm still working on it, but in the meantime, check out the code and let me know if you have any ideas. :-)

Django Without the Web

October 24 2011 by Dan Poirier

One of the things I like best about Django is how easy its ORM makes it to work with databases. Too bad Django is only for web applications. Sure, you could deploy a Django app and then make use of it from a non-web application using a REST API, but that would be too awkward.

But there is an easy way to use Django without the web! Here's the trick - write your application as Django management commands. Then you can run it from the command line. Just like 'manage.py syncdb' or 'manage.py migrate', you can run 'manage.py my_own_application' and your application has access to the full power of Django ORM.

Adding a new Django management command is surprisingly easy:

  1. Add a management/commands directory to your application.
  2. Create a anything.py file containing a class that extends django.core.management.base.BaseCommand or a subclass.
  3. Write a handle method that runs your application
  4. Run 'manage.py anything'

Here's an example of a trivial command:

from django.core.management.base import BaseCommand

class Command(BaseCommand):
    def handle(self, *args, **kwargs):
        print "Hello, world"

Create a management/commands directory in your application and save this there as 'hello.py'.

Now try it:

$ ./manage.py hello
Hello, world
$

How about doing something useful?  Here's an example that prints out all of your invoices, so you can see how easy it is to access your data:

from django.core.management.base import BaseCommand
from appname.models import Invoice

class Command(BaseCommand):
    def handle(self, *args, **kwargs):
        print "Invoices"
        for invoice in Invoice.objects.order_by('date'):
            print u"%s %s" % (invoice.date, invoice.summary)

I've used custom management commands to do things like importing data where something more complicated than loading a fixture was needed.

For more details, see the Django documentation.

Caktus 2012 Summer Internship Program

October 12 2011 by Nicole Foster

I'm excited to announce that Caktus is looking for candidates for our summer internship program. It is a 12 week paid position in our Carrboro, NC office. We're driving distance from UNC Chapel Hill, NC State Univeristy in Raleigh, and Duke in Durham, so students from all parts of the NC Research Triangle are welcome to apply.

We are looking for a web developer who enjoys working on a team and is excited to work on new and diverse projects. While working with us you will get to work on Django-powered web applications, learn about test driven development and other agile methodologies, perform front-end development in HTML, CSS and JavaScript (jQuery) and become familiar with Linux (Debian-flavor) desktop and server systems. Check out the full job posting here

If you'd like to spend your summer working with some great people on interesting projects please email us at jobs+website@caktusgroup.com with your resume and, if applicable, links to samples of code you have written. Kindly include a brief note describing why you would be a great fit for this opportunity.

Caktus Hosts 3rd Django Sprint in North Carolina

October 10 2011 by Nicole Foster

Here at Caktus, we love Django and use it to make all of our web applications. To help support the Django community, we are hosting a development sprint on November 12th and 13th at our office in Carrboro, NC in preparation for the 1.4 release. The sprint is a great is an excuse for people to get together and focus their undivided attention on improving Django. You will be helping out by providing bug fixes, improving the documentation and also adding features to existing packages.   

If you would like participate in the sprint, no previous experience is necessary and this would be a great time to start contributing.  Mark wrote a great blog piece about how to get started contributing to Django through sprinting that you can read here

We'll be here at 9:00 AM both days and the day usually ends between 4-5:00 PM, depending on the momentum, and afterwards everyone gets together for dinner and drinks. If you would like to attend, please RSVP at the Eventbrite and if you cannot make it to the office, please submit your name to the online roster

We look forward to seeing you!

Caktus Group Welcomes Designer and Front End Developer Julia Elman

September 30 2011 by Tobias McNulty

I'm delighted to announce that Julia Elman has joined our growing team of web developers here at Caktus. Julia started her design career almost 10 years ago in an internal marketing group, and first learned about Django at the SXSW Interactive Festival in 2008. Prior to joining the Caktus team, Julia worked at the Lawrence Journal World (the birthplace of Django) and as a freelance designer.

Caktus is a seasoned team of web developers that creates interactive, content-rich sites and applications with the Django web framework. We put a strong emphasis on best practices, employ an agile method, and also actively participate in the Django development community.

For more information about Caktus and our team, check out our newly updated team page!

Bulk inserts in Django

September 20 2011 by Dan Poirier

I recently found a way to speed up a large data import far more than I expected.

The task was to read data from a text file and create data records in Django, and the naive implementation was managing to import about 55 records per second, which was going to take far too long given the amount of data that needed to be imported.

My co-worker Karen Tracey suggested changing to bulk inserts. Instead of creating and saving one Django record at a time, we'd create a whole batch of Django objects, then save them all in one SQL operation. I figured reducing the number of database round-trips would speed things up somewhat, but was not prepared for the actual numbers - I'm consistently getting around two orders of magnitude improvement compared to single record inserts.

As I scaled up, I made one more change - instead of doing the insert in one batch, I limited each batch to a few hundred records. I didn't want to store an unlimited number of Django objects in memory at once, and some benchmarking showed that the benefit of batching the inserts leveled off at a few hundred records.

Caveats

There are a few differences from normal object creation. First, save() is not called on the instances, nor are post_save signals sent, and the model instances' primary keys are not set. If you're doing anything more complicated than dumping a bunch of data into the database, you'll probably need to stick with creating objects individually.

Also, the code we're using to do the bulk insert does not handle ForeignKeys properly. The workaround when creating the Django objects is to set the value of any ForeignKey field to the primary key of the object referred to, if any.

Example

Here's what code for a bulk insert might look like.

from bulkops import insert_many
from our_models import Book

objects = []
for data in data_source:
    # Assume data['foreign_key'] is a reference to another model
    # Change that to its primary key
    data['foreign_key'] = data['foreign_key'].pk
    objects.add(Book(**data))
    # Keep our batch size from getting too big
    if len(objects) > 200:
        insert_many(objects)
        objects = []
insert_many(objects)

Django 1.4

The current development branch of Django has added a bulk insert feature, which seems likely to be included in Django 1.4. It's very similar to the code we're using here - just change "insert_many(objects)" to "Book.objects.bulk_create(objects)". That's subject to change before Django 1.4 is released, of course.

Credit

Credit goes to Karen for suggesting the approach to me, and Ole Laursen's blog post for the original idea and the implementation that we're using.

Links

Ole Laursen's blog post: http://ole-laursen.blogspot.com/2010/11/bulk-inserting-django-objects.html

Implementation: http://people.iola.dk/olau/python/bulkops.py

Original commit to Django development: https://code.djangoproject.com/changeset/16739

Testing Web Server Configurations with Fabric and ApacheBench

September 13 2011 by Tobias McNulty

Load testing a site with ApacheBench is fairly straight forward. Typically you'd just SSH to a machine on the same network as the one you want to test, and run a command like this:

ab -n 500 -c 50 http://my.web.server/path/to/page/

The -n argument determines the number of requests to execute, and the -c argument the determines the concurrency level--or how many requests will be running simultaneously at any given time.

For Python and Django web applications, Fabric is popular tool for deploying code to and running other commands on remote servers. It's built in Python, and its simple syntax makes it easy to use as well. For more information and a primer on Fabric, check out the post that Colin Copeland wrote back in 2010, titled Basic Django deployment with virtualenv, fabric, pip and rsync.

Running ApacheBench from Fabric is useful because you can easily do other things like customize and update your web server configuration in an automated way. For example, here's a sample template for an Apache server configuration that I upload to our web servers using Fabric:

ServerName %(www_server_name)s

WSGIDaemonProcess my_site-%(environment)s processes=%(process_count)s threads=%(thread_count)s display-name=%%{GROUP}
WSGIProcessGroup my_site-%(environment)s
WSGIScriptAlias / %(apache_root)s/%(environment)s.wsgi

ErrorLog %(log_root)s/wsgi.error.log
LogLevel info
CustomLog %(log_root)s/wsgi.access.log combined

You'll notice the %s-style Python string formatting syntax in the Apache config. These are populated by Fabric's files.upload_template method when the file is copied to the remote server, and are based on variables you pass in to the context. Here's a sample Fabric method to upload your Apache configuration to the remote server:

def _join(*items):
    """
    We're deploying to Linux, so hard code that type of path join here. Using
    os.path.join would not work when deploying from Windows.
    """
    return '/'.join(items)

def apache_graceful():
    sudo('/etc/init.d/apache2 graceful')

def update_apache_conf(process_count=15, thread_count=1):
    env.process_count = process_count
    env.thread_count = thread_count
    for ext in ['conf', 'wsgi']:
        source = os.path.join(env.deployment_dir, 'templates',
                              'apache.%s' % ext)
        dest = _join(env.home, 'apache.conf.d',
                     '.'.join([env.environment, ext]))
        files.upload_template(source, dest, context=context, mode=0755,
                              use_sudo=True)
    apache_graceful()

Specifying process_count and thread_count in the arguments to update_apache_conf() means that I can pass those in from the command line, like so:

fab staging update_apache_conf:10,3

This would install an Apache configuration on the server that starts up 10 mod_wsgi processes with 3 threads each.

Running ApacheBench through Fabric is also easy to do, but here's a slightly more complex example I put together that saves the results in time-stamped folders, whose names also include the number of requests, concurrency level, process count, and thread count of the test:

def benchmark():
    config = {
        'number': 500,
        'concurrency': 50,
        'url': 'http://my.web.server/path/to/page/',
    }
    # prime the server with a few requests before logging any results
    run('ab -n 10 -c 1 {url}'.format(**config))
    context = dict(env)
    context.update(config)
    context['now'] = datetime.datetime.now().strftime('%Y-%m-%d_%H:%M:%S')
    dir_name = '{now}_n={number},c={concurrency}'
    if 'process_count' in context and 'thread_count' in context:
        dir_name += '_p={process_count},t={thread_count}'
    dir_name = dir_name.format(**context)
    context['test_dir'] = os.path.join('test_runs', dir_name)
    run('mkdir -p {0}'.format(context['test_dir']))
    for x in range(4):
        context['test_file'] = os.path.join(context['test_dir'],
                                            'ab{0}.txt'.format(x))
        run('ab -n {number} -c {concurrency} {url} > '
            '{test_file}'.format(**context))

You can run these commands together to update the Apache configuration and run a benchmark with a single line from the shell, like so:

fab staging update_apache_conf:10,5 benchmark

This would update the Apache configuration on the remote server, run a few requests to prime the server, and then run the specified ApacheBench test 4 times and save the results in text files in a timestamped directory.

To test lots of different server configurations at once with minimal user interaction, you can further script this by wrapping the above command in a Bash for loop, like so:

for process_count in {1..76..5}; do fab staging update_apache_conf:$process_count,1 benchmark; done

This command iterates from 1 through 76, in steps of 5 (1, 6, 11, 16 ... 76), sets the Apache configuration to use that number of processes, and runs a separate benchmark for each configuration.

Anyway, that's just a little insight into how one might deploy and test a Python or Django application using Fabric and ApacheBench. Hope you find it helpful!

Getting Started using Python in Eclipse

August 31 2011 by Dan Poirier

Eclipse with the PyDev module has a lot to offer the Python programmer these days. If you haven't looked at PyDev before, or not in a while, it's worth checking out.

Here are some of my favorite features:

  • One-keystroke navigation to the definitions of variables, methods, classes
  • Code completion, including automatically adding import statements
  • Clean up imports
  • Refactoring, including renaming across projects
  • Clean up whitespace

There are many more. I recommend taking a look at the PyDev web site and blog to see what might appeal to you.

Getting Eclipse and PyDev

If you're already using Eclipse, you can add PyDev to it. If not, you also have the option to get a version of Eclipse with PyDev already included. You install PyDev into your existing Eclipse the same way you install any other Eclipse add-on: first tell Eclipse where to find the add-on, then install it.

  • In Eclipse 3.6 and 3.7, select Help/Install New Software...
  • On the panel that pops up, click "Add..." at the top right.
  • Enter any name (e.g. "PyDev")
  • Enter http://pydev.org/updates as the Location, then click OK.
  • In the list of available software, select PyDev. 
  • Click Next, Next, accept the license, Finish.
  • If Eclipse asks whether to trust the PyDev certificate, agree.
  • When the install is complete, allow Eclipse to restart.

To get Eclipse with PyDev already installed, go to http://www.aptana.com/products/studio3/download and download Aptana Studio for your platform. Aptana Studio 3.0.4 is Eclipse 3.6 plus PyDev plus other add-ons.

Preferences

There are some preferences in Eclipse you probably want to change if you'll be working with Python.  Open the preferences by selecting Window/Preferences, then use search to find and set these:

  • Insert spaces for tabs: checked, but note that the PyDev editor ignores this and you need to make a similar setting in the PyDev settings for editing Python files.
  • Show whitespace characters:
    • In Eclipse 3.6, you probably want this off except when you're looking for trailing whitespace.
    • In Eclipse 3.7, you can check the box and then click on "whitespace characters" and set just the trailing whitespace visible, which is unobtrusive enough to leave enabled all the time.
  • Replace tabs with spaces when typing: checked.  This is the one that PyDev obeys.
  • Right trim lines: checked, otherwise you end up with a lot of lines with just indentation on them.
  • Add newline at end of file: checked.
  • Auto-Format editor contents before saving: If you check this, every time you save a file PyDev will fix it to comply with the other settings on this preferences page. That's great if you're working on your own project, but not so good if you're doing maintenance on somebody else's project and don't want to make random changes to white-space all over the place.

Explore the other PyDev settings. The "Code Analysis" section is particularly interesting, as it lets you control the kinds of things that Pydev marks as errors or warnings.

Finally, at least one Python interpreter needs to be configured.  Still in Preferences, go to PyDev/Interpreter - Python.  For now, just click "Auto Config" and click OK on the dialog that pops up.  Then click OK to close Preferences.  PyDev will take a while to analyze the python installation and libraries.

Perspective

Select Window/Open Perspective/Other and choose PyDev.

Starting to use Eclipse and PyDev with a project

I typically use Eclipse with Django projects, though I haven't tried PyDev's Django-specific features yet.

When I want to work with a project in Eclipse, first I check it out locally. Then here are the steps I follow:

  • File/New/Project (not PyDev project, I don't like the PyDev new project wizard)
  • Choose General/Project, click Next
  • Enter a project name
  • Uncheck "use default location" and set the location to the top directory of my project
  • Click Finish
  • Right-click on the project and select PyDev/Set as Pydev Project
  • Right-click on the project and select Properties
  • go to PyDev - PYTHONPATH
  • In the Source Folders tab, use "Add source folder" to add folders that need to be on your python path for your project to work.  Often this is either the top-level project folder or a folder immediately inside it.

Using PyDev with virtualenv

If you use virtualenv (and if not, why not?), there are a couple additional steps to take.

First, add the interpreter from your virtual environment as another Python interpreter:

  • Open Preferences
  • Go to PyDev/Interpreter - Python
  • Click "New..."
  • For the Executable, navigate to your virtual environment's bin directory and select the Python interpreter there.
  • Choose another name for your interpreter if you want, probably something shorter than the default.  I like to use the name of the virtual environment, with "-env" appended.
  • Click OK
  • Now here's the tricky part - a dialog will pop up asking which library folders to add.  Keep the defaults but you also need to add your system python library directories - e.g. /usr/lib/python2.6, /usr/lib64/python2.6, and /usr/lib/python2.6/plat-linux.  Otherwise PyDev won't be able to find all the libraries your python interpreter will be using.
  • Click OK

Then, set the new interpreter as the interpreter for your project:

  • Right-click the project and select Properties
  • Go to Pydev - Interpreter/Grammar
  • Under Interpreter, select your new interpreter
  • Click OK

Now PyDev should be able to find any libraries you have installed in the virtual environment when needed. 

If you install additional libraries, you might need to go back to the interpreter definitions, click "Apply", and tell Pydev which interpreters it should scan again. Until you do that, PyDev might not notice your new libraries.

For more information, see    http://pydev.blogspot.com/2010/04/pydev-and-virtualenv.html 

Links

Caktus Consulting Group Sponsors DjangoCon 2011

August 31 2011 by Nicole Foster
DjangoCon logo

DjangoCon 2011 is coming up next week and I'm excited to announce that Caktus is sponsoring the conference again this year! It is being held once again in beautiful Portland, Oregon from September 5th through the 10th. We've grown quite a bit from last year, there will be 9 team members-Colin, Tobias, Karen, Mark, Dan, Scott, George, Caleb and myself-attending the conference this year. 

We are all really excited to hear some great talks, meet other Django developers and learn more about our all time favorite framework. You can read about why we like it so much in our blog post Why Caktus Uses Django. 

Lightning Talk Lunch: Service Page API

August 17 2011 by Colin Copeland

Leading the second talk of our Caktus Lightning Talk Lunch series, Calvin Spealman presented on the Service Page API:

The Service Page API is a prototype and proof of concept to deliver a wide range of browser plugins across multiple browsers and to extend the APIs available to websites a user visits by allowing plugins to extend the Javascript API with new libraries, integrate with external services, and more. It puts the power in the users hand to control which services can interact. This talk covers the problems with the current state of browser extensions and the difficulty in building them across multiple browsers consistently, and how the Service Page API is a solution to this, with code examples.

Calvin's Service API

The slides from this talk are available on talks.caktusgroup.com and the code can be found on GitHub. Follow his project on GitHub to stay up-to-date on it's status!

Managing Client Expectations Amid Shifting Deadlines

August 11 2011 by George Saines

Estimating development time is notoriously difficult, and when moving deadlines are added to the mix, shift happens.

Estimating development time for clients is difficult enough without having to second guess deadlines. Yet despite the best efforts, if your company has a healthy deal flow, it’s almost inevitable that you’ll eventually have a project deadline shift.

It seems an inexorable law of nature that deadlines always move forward. Projects slated for 10 weeks suddenly become 6 week sprints, and 4 week projects suddenly turn into 14 days of pain. Shifting deadlines cause a lot of stress even when clients and project managers communicate perfectly, but they are an absolute nightmare if either party doesn’t take responsibility early to communicate a new set of expectations.

Since I have the most experience managing projects, I’ll speak from the perspective of a project manager. Here are 3 steps that I have found to significantly reduce stress when clients need to alter the delivery schedule:

1. Do not commit to new milestones without internal communication.

When I was in the 4th grade, my best friend and I spent most of our free time together. Normally I would call him on the phone and while still talking, ask my parents if I could do whatever he and I were planning (normally riding bikes). With my friend on the line I would ask “Mom and Dad, can I go out bike-riding with Casey in an hour?” This approach often made my parents frustrated. I would be made to hang up, talk to them, and call back.

That little story might seem unrelated, but it is very similar to a client calling to say they need to release 4 weeks earlier than planned. Even if nobody could have predicted a change in the schedule, as the manager of that project, you now have a problem. You and your colleagues likely have other projects in the pipe, an abundance of work for that period, a vacation day or two, and a web of unseen working commitments.

The client will be under more pressure still and wants to hear you say “okay, no problem, we can get that done 4 weeks early.” Even if you think your team can do it, never make that assumption when speaking with a client. The best way to handle the situation is tell the client you have to talk with the team members and get back to them. Then, start talking with your team.

2. Consult with team members.

Internal discussion is especially necessary when there are aspects of a project that the project manager doesn’t understand 100%. A lack of understanding could be due to the limitations of a particular PM or the vastness of the project, but the person doing the coding is almost always in a better place than a manager to evaluate changes in engineering specs and deadlines.

In the case of a 4 week project adjustment, the operative question is how to balance the altered client interests with the original contract. Can features be cut? Can other projects be sidelined for a week? Are people willing to work overtime? Obviously the desirability of answers to these questions will depend on your specific situation, but the important part is to have the discussion. This will get everyone on the same page and present a unified front to the client. It’s more professional presentation and management.

3. Clearly communicate internal resolutions to the client.

After the internal meeting, contact the client and communicate the talking points in clear, direct statements. It’s easy, especially when under pressure from clients, to waffle, but resist the urge. A statement like “I spoke to the team and we aren’t sure the deployment schedule is realistic given the change in the deadline” is terrible because it isn’t crystal clear. What you mean to say in this case is “we cannot deploy on time.” So why not say it? Obviously don’t be rude, but straight-forward, simple communication will avoid future misunderstanding.

Simply go through what your team can realistically achieve in plain language, making sure to address the most critical deliverables. I have found it is best to lead with the items that won’t get done to client satisfaction, and conclude with items that will. The balancing affect of good and bad news tends to lean more in favor of positive reactions when those are the last things mentioned.

Conclusion

The easiest way to make your team members hate you and client take business elsewhere is to over promise and under deliver on a tight schedule. When deadlines change mid-project, the project outcome is immediately restricted to a less desirable set of outcomes. Using the communication techniques outlined above, however, project managers can often turn bad situations into opportunities for glowing client stories. Sticking to mutual expectations is what drives client satisfaction, and even when circumstances conspire to restrict expectations, you can still impress.

Junior Django Developer Wanted

August 09 2011 by Nicole Foster

Caktus is currently seeking a junior Django developer for our team. The ideal candidate would have 6 months of experience of building dynamic web applications in any language, at least 3 months of experience using Python and Django, and also have a basic understanding of relational databases such as PostgreSQL and MySQL. The junior developer position will consist of data modeling complex business ideas, creating and integrating Django applications in new projects, maintaining existing Django projects and also assisting with Django deployments. 

 If this sounds like something you may be interested in or know someone who might be, check out the entire job posting here. Also, if you would like to apply, please submit your resume and code samples to jobs+website@caktusgroup.com. 

We're hiring a Front End Web Developer

August 04 2011 by Nicole Foster

I'm excited to announce that Caktus is actively seeking a Front End Web Developer. The position would entail creating wireframes and mock ups of proposed designs and user stories, performing front end jQuery and Javascript development, converting PSD's into standards compliant HTML and CSS, and also cloning repositories and running Django sites locally for development. The position would also consist of doing user experience design for internal and client projects. This is a contract for hire position with the potential to become a full time position with benefits. For a more detailed description of what you'd do as a Front End Web Developer here at Caktus, check out the full job posting here

If you think you might be a great fit or know someone who might be, please send over your resume and links to work you've done to jobs+website@caktusgroup.com, we would love to hear from you!

An alternative RapidSMS router implementation (with Celery!)

July 18 2011 by Colin Copeland

We've been using RapidSMS, a Django-powered SMS framework, more and more frequently here at Caktus. It's evolved a lot over the past year-- from being reworked to feel more like a Django app, to merging the rapidsms-core-dev and rapidsms-contrib-apps-dev repositories into a single codebase (no more submodules!), to finally becoming installable via pypi. The "new core" is in a great state now and is much easier to work with. However, one particular aspect of RapidSMS, the route process, has always been complicated and confusing to deal with. Tobias began the conversation on this issue after returning from a 6-week long UNICEF project in Zambia. He summarized the route process like so:

  • The route process as it currently stands is complicated; it includes a number of threads and the ways in which they interact is not always intuitive
  • If the route process dies unexpectedly, all backends (and hence message processing) are brought offline
  • Automated testing is difficult and inefficient, because the router (and all its threads) needs to be started/stopped for each test

The RapidSMS router is a globally instantiated object that routes incoming messages through each RapidSMS app and sends outgoing messages via installed backends. The run_router management command starts the router process and creates individual threads for each backend defined in the settings module. I'm not entirely certain as to why the route process was originally threaded, but I assume it was designed to more easily integrate blocking backends (like gsm) into RapidSMS. However, with the standardization of Kannel and SMS-based web services, like Twilio, both of which offload the low level communication work, I believe the threading aspect is now less important. So recently, in what started as a proof of concept, we began work on a decoupled router implementation called rapidsms-threadless-router. rapidsms-threadless-router provides a threadless_router app, which removes the threading functionality from the legacy Router class. Rather, all inbound requests are handled via the main HTTP thread. threadless_router attempts to:

  • Make RapidSMS backends more Django-like. Use Django’s URL routing and views to handle inbound HTTP requests
  • Remove clutter and complexity of route process and threaded backends
  • Ease testing – no more threading or Queue modules slowing down tests

In comparison to the legacy route process, threadless_router handles all inbound and outbound backend communication from within the main HTTP thread. Each request creates a new router instance and no separate process or thread is created. This simplifies the Router class significantly. Additionally, threadless_router allows inbound messages to be easily passed off to an asynchronous task queue, such as Celery. Task queues allow message processing to be handled outside of the HTTP request/response cycle, which is perfect for SMS-based applications, as out of band responses are more than acceptable.

threadless_router is not, however, a drop-in replacement for the legacy router. Legacy backends will not work and as all routing is handled from within the HTTP thread, non-HTTP backends, such as pygsm, are not currently compatible with threadless_router. A simple wrapper around pygsm could be written to talk both to the modem and spin up a simple HTTP server to communicate with RapidSMS. This would decouple pygsm from RapidSMS and exist as it's own separate process. Integrating with supervisord would work great here too. Several contrib applications, such as httptester and scheduler, are also not compatible. We've bundled a new httptester as a replacement and celerybeat can be used to mimic the scheduler functionality. A full list of caveats can be found in the docs.

The full documentation for rapidsms-threadless-router, including installation instructions and examples, can be found on readthedocs.org. If you're already familiar with the internals of RapidSMS and would like to see examples of threadless backend implementations, I suggest reviewing the bundled http and httptester backends and our updated twilio backend.

I would like to mention that Nicolas Pottier and Eric Newcomer created rapidsms-httprouter, which also handles all messages within the main HTTP thread. The main difference between rapidsms-httprouter and rapidsms-threadless-router is that, while httprouter handles inbound messages in a Django view, it still starts up threads (for handling outgoing messages) like the current router (also from within a Django view). Make sure to check it out as well and let us know what you think!

Older