Archive for May, 2009

Testing Django Views for Concurrency Issues

May 26th, 2009 by tobias

At Caktus, we rely heavily on automated testing for web app development. We create tests for all the code we write, ideally before the code is written. We create tests for every bug we find and, resources permitting, ramp up the test suite with lots of random input and boundary testing.

Debugging concurrency issues or race conditions has long been a nightmare. There are only so many times you can double click the link in your web app that is generating some bizarre failure.

Using the Django test client, I created a little decorator that you can use in your unit tests to make sure a view doesn’t blow up when it’s called multiple times with the same arguments. If it does blow up, and you happen to be using PostgreSQL, chances are you can fix the issues by using Colin’s previously posted require_lock decorator.

Here’s the decorator for testing concurrency:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def test_concurrently(times):
    """ 
    Add this decorator to small pieces of code that you want to test
    concurrently to make sure they don't raise exceptions when run at the
    same time.  E.g., some Django views that do a SELECT and then a subsequent
    INSERT might fail when the INSERT assumes that the data has not changed
    since the SELECT.
    """
    def test_concurrently_decorator(test_func):
        def wrapper(*args, **kwargs):
            exceptions = []
            import threading
            def call_test_func():
                try:
                    test_func(*args, **kwargs)
                except Exception, e:
                    exceptions.append(e)
                    raise
            threads = []
            for i in range(times):
                threads.append(threading.Thread(target=call_test_func))
            for t in threads:
                t.start()
            for t in threads:
                t.join()
            if exceptions:
                raise Exception('test_concurrently intercepted %s exceptions: %s' % (len(exceptions), exceptions))
        return wrapper
    return test_concurrently_decorator

To use this in a test, create a small function that includes the thread-safe code inside your test. Apply the decorator, passing the number of times you want to run the code simultaneously, and then call the function:

1
2
3
4
5
6
7
8
9
10
11
class MyTestCase(TestCase):
    def testRegistrationThreaded(self):
        url = reverse('toggle_registration')
        @test_concurrently(15)
        def toggle_registration():
            # perform the code you want to test here; it must be thread-safe 
            # (e.g., each thread must have its own Django test client)
            c = Client()
            c.login(username='user@example.com', password='abc123')
            response = c.get(url)
        toggle_registration()

Explicit Table Locking with PostgreSQL and Django

May 26th, 2009 by Colin Copeland

By default, Django doesn’t do explicit table locking. This is OK for most read-heavy scenarios, but sometimes you need guaranteed, exclusive access to the data. Caktus uses PostgreSQL in most of our production environments, so we can use the various lock modes it provides to control concurrent access to the data. Once we obtain a lock in PostgreSQL, it is held for the remainder of the current transaction. Django provides transaction management, so all we need to do is execute a SQL LOCK statement within a transaction, and Django and PostgreSQL will handle the rest.

Below is an example decorator we came up with to provide easy table-locking access in Django:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
from django.db import transaction
 
LOCK_MODES = (
    'ACCESS SHARE',
    'ROW SHARE',
    'ROW EXCLUSIVE',
    'SHARE UPDATE EXCLUSIVE',
    'SHARE',
    'SHARE ROW EXCLUSIVE',
    'EXCLUSIVE',
    'ACCESS EXCLUSIVE',
)
 
def require_lock(model, lock):
    """
    Decorator for PostgreSQL's table-level lock functionality
 
    Example:
        @transaction.commit_on_success
        @require_lock(MyModel, 'ACCESS EXCLUSIVE')
        def myview(request)
            ...
 
    PostgreSQL's LOCK Documentation:
    http://www.postgresql.org/docs/8.3/interactive/sql-lock.html
    """
    def require_lock_decorator(view_func):
        def wrapper(*args, **kwargs):
            if lock not in LOCK_MODES:
                raise ValueError('%s is not a PostgreSQL supported lock mode.')
            from django.db import connection
            cursor = connection.cursor()
            cursor.execute(
                'LOCK TABLE %s IN %s MODE' % (model._meta.db_table, lock)
            )
            return view_func(*args, **kwargs)
        return wrapper
    return require_lock_decorator

This is, by no means, a perfect solution. Feel free to comment below.

Parsing Microseconds in a Django Form

May 26th, 2009 by tobias

There’s currently no way to accept microsecond-precision input through a Django form’s DateTimeField. This is an acknowledged bug, but the official solution might not come very soon, because the real fix is non-trivial.

In the meantime, here’s one approach that will work in most cases:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class DateTimeWithUsecsField(forms.DateTimeField):
    def clean(self, value):
        if value and '.' in value: 
            value, usecs = value.rsplit('.', 1) # rsplit in case '.' is used elsewhere
            usecs += '0'*(6-len(usecs)) # right pad with zeros if necessary
            try:
                usecs = int(usecs) 
            except ValueError: 
                raise ValidationError('Microseconds must be an integer') 
        else: 
            usecs = 0 
        cleaned_value = super(DateTimeWithUsecsField, self).clean(value)
        if cleaned_value:
            cleaned_value = cleaned_value.replace(microsecond=usecs)
        return cleaned_value

To use this in a model form, you can override the field like so:

1
2
3
4
class MyForm(forms.ModelForm):
    def __init__(self, *arg, **kwargs):
        super(MyForm, self).__init__(*arg, **kwargs)
        self.fields['date'] = DateTimeWithUsecsField()

Seamlessly switch off (and on) a Django (or other WSGI) site for upgrades

May 25th, 2009 by tobias

In preparation for migrating the EveryWatt database from one machine to another, I wrote this little WSGI script to easily disable the site while I copy the data. Since it doesn’t depend on Django or really anything else (other than a functioning WSGI server), you can use it for other upgrades, too.

This is useful for preventing updates to the database while you, for example, dump the database on one machine and load it on another. With everything else already in place on either side, the user should only see the “Upgrade in progress” message for a few minutes.

Since EveryWatt includes a number of data logger clients that upload utility meter readings to the site through its Open API, I wanted to make sure any POST attempts received a temporary failure message (the data logger will store the data and retry the POST every minute)–hence the 405 Method Not Allowed for all non-GET requests.

Here’s the script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import os
import sys
 
UPGRADING = False
 
#Calculate the project path based on the location of the WSGI script.
project_dir = os.path.dirname(__file__)
sys.path.append(project_dir)
 
def upgrade_in_progress(environ, start_response):
    upgrade_file = os.path.join(project_dir, 'media', 'html', 'upgrade.html')
    if os.path.exists(upgrade_file):
        response_headers = [('Content-type','text/html')]
        response = open(upgrade_file).read()
    else:
        response_headers = [('Content-type','text/plain')]
        response = 'Application upgrade in progress...please check back soon.'
 
    if environ['REQUEST_METHOD'] == 'GET':
        status = '503 Service Unavailable'
    else:
        status = '405 Method Not Allowed'
    start_response(status, response_headers)
    return [response]
 
if UPGRADING:
    application = upgrade_in_progress
else:
    os.environ['DJANGO_SETTINGS_MODULE'] = 'settings'
    import django.core.handlers.wsgi
    application = django.core.handlers.wsgi.WSGIHandler()

And in case you need it, here’s one way to dump a PostgreSQL database on one machine while you load it on another, to be run on the new host, as the database superuser:

1
pg_dump -h <old host> -U <user> <old database> | psql <new database>

Good luck and please post your questions/comments.

Eclipse Ganymede and Subclipse on Ubuntu - JavaHL (JNI) not available

May 21st, 2009 by tobias

I finally got around to updating my Eclipse, PyDev, and Subclipse environment today, which I use for Django development.

Formerly I was using the SvnKit (pure-Java) libraries. SvnKit “felt” slow to me, compared to my command line SVN client, so this time I tried to get the JavaHL (JNI) libraries working.

For the record I’m using Ubuntu (jaunty) with Eclipse 3.4 (Ganymede). This version of Ubuntu comes with Subversion 1.5, so I need to install Subclipse 1.4. See:

http://subclipse.tigris.org/servlets/ProjectProcess?pageID=p4wYuA

I installed everything through the Eclipse update manager (minus SvnKit), but JavaHL didn’t show up under Preferences -> Team -> SVN. The error message was: JavaHL (JNI) not available.

I had installed Eclipse manually (not through apt-get), so the solution was to install the JavaHL libraries:

apt-get install libsvn-java

and add the following line to my eclipse.ini (usually in the top level eclipse directory):

-Djava.library.path=/usr/lib/jni

Restart Eclipse, and you should be good to go!