December 12 2011 by Colin Copeland
As Tobias mentioned in Scraping Data and Web Standards, Caktus is collaborating with the UNC School of Journalism to help develop Open Rural (the code is on GitHub). Open Rural hopes to help rural newspapers in North Carolina leverage OpenBlock. This blog post is the first of several covering the internals of OpenBlock and, specifically, the geocoder.
OpenBlock Data Model
The OpenBlock geocoder can only geocode from the data is has. It doesn't leverage a 3rd-party API or service. It only uses what's loaded in PostgreSQL (with PostGIS and GeoDjango) and, in this example, what comes from the US Census Bureau and local city and county GIS offices.
Further, the imported data is typically filtered by a bounding box setting in METRO_LIST. The setting, extent, is a list of leftmost longitude, lower latitude, rightmost longitude, upper latitude. This defines a bounding box - the range of latitudes and longitudes that are relevant to your area. A small or restrictive box will limit imported ZIP code and block data to areas that fall within the box.
Let's look at an example with these shapefiles:
We'll start with a restrictive extent that only consists of downtown Chapel Hill:
METRO_LIST = (
{
# Extent of the region, as a longitude/latitude bounding box.
'extent': (-79.066272, 35.91671, -79.040481, 35.910663),
# ...
},
)
This selection loaded 2 ZIP codes:
$ django-admin.py import_nc_zips
Importing zip codes...
# ...
Skipping 27511, out of bounds
Skipping 27513, out of bounds
Created ZIP Code 27514
Created ZIP Code 27516
Skipping 27517, out of bounds
Skipping 27519, out of bounds
# ...
Created 2 zipcodes.
And limited the block data as well:
$ django-admin.py import_county_streets 37135
Importing blocks, this may take several minutes ...
Created 73 blocks
Populating streets and fixing addresses, these can take several minutes...
Populating the streets table
streets: created: 28
block_intersections: created: 160
Done.
Restricting the area will limit the ability of the geocoder. In this case, for example, it can geocode the intersection of Franklin and Henderson, which is right downtown, but not Franklin and Estes (don't worry, we'll get into more geocoding details in the next section). A map helps illustrate this more clearly. Below you can see the bounding box with pins on the two intersections:
View OpenRural - Downtown Chapel Hill in a larger map
If we increase the bounding box, we'll get a lot more data:
METRO_LIST = (
{
# Extent of the region, as a longitude/latitude bounding box.
'extent': (-79.165922, 35.829095, -78.978468, 36.02426),
# ...
},
)
With an extent that encompasses all of Chapel Hill, the importer loaded 9 ZIP codes, 4302 blocks, 1699 streets, and 7189 intersections. Here's a map illustrating the larger extent:
View OpenRural - Orange County, NC in a larger map
It's up to the maintainer of an OpenBlock install to determine which extent to use as it is based on the specifics of the application. A large extent will import more ZIP codes and blocks and, therefore, will slow down geospatial queries and may include unwanted geographic areas.
Street
Now that we have NC Orange County data loaded, let's investigate this data with the OpenBlock models.
The Street model contains a catalog of all loaded streets. It's a simple model with only a few fields:
- street
- pretty_name
- street_slug
- suffix
- city
- state
In NC Orange County, we can see that the street data spans 4 cities:
>>> from ebpub.streets.models import Street
>>> Street.objects.order_by('city').values_list('city', flat=True).distinct()
[u'', u'CARRBORO', u'CHAPEL HILL', u'DURHAM', u'HILLSBOROUGH']
Some streets cross city lines and therefore contain two entries:
>>> Street.objects.filter(street_slug='rosemary-st').values_list('city', flat=True)
[u'CARRBORO', u'CHAPEL HILL']
And, for example, if we're looking for Franklin St. in Chapel Hill, NC, we can filter for it here:
Blocks
Blocks are fundamental to OpenBlock and are used by the geocoder. OpenBlock defines a block as "a segment of a single street between one side street and another side street." The Block model is slightly more intricate than Street, but each entry basically represents the address range of a street for each block segment.
To start, we can see that Franklin St. is divided into roughly 32 blocks:
>>> from ebpub.streets.models import Block
>>> Block.objects.filter(street_slug='franklin-st').count()
32
It's sectioned into an east and west segment:
>>> Block.objects.filter(street_slug='franklin-st').order_by('street_pretty_name').values_list('street_pretty_name', 'predir').distinct()
[(u'Franklin St.', u'W'), (u'Franklin St.', u'E')]
And can have an address between 100 and 1899:
>>> Block.objects.filter(street_slug='franklin-st').aggregate(Min('from_num'), Max('to_num'))
{'from_num__min': 100, 'to_num__max': 1899}
So we can find the block that contains the 123 address:
Also, on a side note, it's possible for some blocks to span cities:
Geocoding
Now that we have a basic understanding of how the data is stored within OpenBlock, let's do some geocoding. Most of these examples will use the SmartGeocoder class. SmartGeocoder delegates to specific geocoders (AddressGeocoder, BlockGeocoder, and IntersectionGeocoder) based on how it interprets the string with regular expressions.
Addresses
To start, let's geocode "123 East Franklin Street":
This one was pretty easy for geocoder to parse and find. You can see that not only has it found the associated block, but it also knows the exact geographic point. However, this will fail if passed a non-existent address number (InvalidBlockButValidStreet):
In this case, the geocoder was able to extract the address, but it failed to find the associated block in the database. Non-existent streets also fail (DoesNotExist):
Intersections
The geocoder can locate intersections too:
Notice how the intersection field is populated, rather than block. This will raise a DoesNotExist exception when an intersection is not found:
Street Misspellings
OpenBlock provides a model, StreetMisspelling, to define street aliases. This allows you to map a bad street name to a good street name that exists in the database:
Now geocoding "Glen Haven" will find "Glenhaven".
Multiple Cities
By default, OpenBlock is configured to work with a single city, which is defined in METRO_LIST:
# Metros. You almost certainly only want one dictionary in this list.
# See the configuration docs for more info.
METRO_LIST = (
{
# Extent of the region, as a longitude/latitude bounding box.
'extent': (-79.165922, 35.829095, -78.978468, 36.02426),
# The major city in the region.
'city_name': 'Chapel Hill',
},
)
The geocoder will fail if it locates a street that's associated with a city unknown to OpenBlock. For example, 100 Pine Street is in Carrboro and not Chapel Hill:
This street exists in the database due to our extent covering most of Orange County. Since we've setup OpenBlock to encompass an entire county, rather than a single city, we need to define additional cities. This can be accomplished one of two ways:
- Add additional dictionaries to METRO_LIST for each city
- Import city locations into the database and tell OpenBlock to refer to these
We imported Orange County city boundary data above, so we'll use the latter:
METRO_LIST = (
{
# Extent of the region, as a longitude/latitude bounding box.
'extent': (-79.165922, 35.829095, -78.978468, 36.02426),
# Set this to True if the region has multiple cities.
# You will also need to set 'city_location_type'.
'multiple_cities': True,
# The major city in the region.
'city_name': 'Chapel Hill',
# Slug of an ebpub.db.LocationType that represents cities.
# Only needed if multiple_cities = True.
'city_location_type': 'cities',
},
)
Here we enabled multiple_cities and informed OpenBlock that the location type slug is cities, respectively. Now 100 Pine Street will geocode properly:
What's Next
Now that we've had an overview of the geocoder, we'll jump into OpenBlock's place, location, and address parser. Stay tuned!
Update: Read more in OpenBlock Geocoder, Part 2: Text Parsing and Entity Extraction.
April 22 2010 by Colin Copeland
Deployment is usually a tedious process with lots of tinkering until everything is setup just right. We deploy quite a few Django sites on a regular basis here at Caktus and still do tinkering, but we've attempted to functionalize some of the core tasks to ease the process. I've put together a basic example that outlines local and remote environment setup. This is a simplified example and just one of many ways to deploy a Django project (I learned a lot from Jacob Kaplan-Moss' django-deployment-workshop), so I encourage you to browse around the Django community to learn more.
The entire source for this example project can be found in the caktus-deployment Bitbucket repository.
Local Development Environment
The project directory is organized like so:
caktus_website/
__init__.py
apache/
staging.conf -- staging Apache conf
staging.wsgi -- staging wsgi file
blog/
bootstrap.py -- bootstrap local environment
fabfile.py -- manage remote environments with fabric
local_settings.py
manage.py
media/
requirements/
apps.txt -- pip requirements file
settings.py
settings_staging.py -- staging settings file
urls.py
To setup a local development environment, we'll create a virtual environment and run bootstrap.py, which is just a simple script that automates installing Python dependencies using pip:
if "VIRTUAL_ENV" not in os.environ:
sys.stderr.write("$VIRTUAL_ENV not found.\n\n")
parser.print_usage()
sys.exit(-1)
virtualenv = os.environ["VIRTUAL_ENV"]
file_path = os.path.dirname(__file__)
subprocess.call(["pip", "install", "-E", virtualenv, "--requirement",
os.path.join(file_path, "requirements/apps.txt")])
bootstrap.py uses requirements/apps.txt (a pip requirements file), so you can source anything off of PyPI as well as mercurial, git, and SVN repositories that include setup.py files. In this example, django's SVN is the only dependency in apps.txt:
-e svn+http://code.djangoproject.com/svn/django/branches/releases/1.1.X#egg=django
bootstrap.py must be run within virtual environment, so let's create a new virtualenv (I recommend using virtualenvwrapper) and then run bootstrap.py to install the dependencies:
copelco@montgomery:~/caktus_website$ mkvirtualenv --distribute caktus
(caktus)copelco@montgomery:~/caktus_website$ ./bootstrap.py
Now that our environment is setup (and Django is on the python path), we can run normal Django management commands:
(caktus)copelco@montgomery:~/caktus_website$ ./manage.py syncdb --settings=caktus_website.local_settings
(caktus)copelco@montgomery:~/caktus_website$ ./manage.py runserver --settings=caktus_website.local_settings
Great! That's it for our local setup, let's look into deploying the project to a staging server.
Deployment and Remote Management
To help provision the remote server environment (in this case Ubuntu 9.10), we'll use fabric. fabric allows you to streamline deployment by functionalizing common tasks in Python. I've created an example fabfile.py to help bootstrap and deploy the project:
(caktus)copelco@montgomery:~/caktus_website$ fab --list
Available commands:
apache_reload reload Apache on remote host
apache_restart restart Apache on remote host
bootstrap initialize remote host environment (virtualenv, dep...
configtest test Apache configuration
create_virtualenv setup virtualenv on remote host
deploy rsync code to remote host
production use production environment on remote host
staging use staging environment on remote host
symlink_django create symbolic link so Apache can serve django adm...
touch touch wsgi file to trigger reload
update_apache_conf upload apache configuration to remote host
update_requirements update external dependencies on remote host
The fabfile splits the deployment process into discrete steps of 1) virtual environment creation, 2) code transfer, and 3) updating the Python dependencies. The bootstrap command wraps everything together, including initial directory creation, so you can setup the server quickly:
def bootstrap():
""" initialize remote host environment (virtualenv, deploy, update) """
require('root', provided_by=('staging', 'production'))
run('mkdir -p %(root)s' % env)
run('mkdir -p %s' % os.path.join(env.home, 'www', 'log'))
create_virtualenv()
deploy()
update_requirements()
def create_virtualenv():
""" setup virtualenv on remote host """
require('virtualenv_root', provided_by=('staging', 'production'))
args = '--clear --distribute'
run('virtualenv %s %s' % (args, env.virtualenv_root))
def deploy():
""" rsync code to remote host """
require('root', provided_by=('staging', 'production'))
if env.environment == 'production':
if not console.confirm('Are you sure you want to deploy production?',
default=False):
utils.abort('Production deployment aborted.')
extra_opts = '--omit-dir-times'
rsync_project(
env.root,
exclude=RSYNC_EXCLUDE,
delete=True,
extra_opts=extra_opts,
)
touch()
def update_requirements():
""" update external dependencies on remote host """
require('code_root', provided_by=('staging', 'production'))
requirements = os.path.join(env.code_root, 'requirements')
with cd(requirements):
cmd = ['pip install']
cmd += ['-E %(virtualenv_root)s' % env]
cmd += ['--requirement %s' % os.path.join(requirements, 'apps.txt')]
run(' '.join(cmd))
To bootstrap the staging environment, run:
(caktus)copelco@montgomery:~/caktus_website$ fab staging bootstrap
This will run a few commands over SSH and rsync the project directory to a specific location on the staging server. Using rsync is just one of many ways to transfer code to the server, such as pulling code from a remote repository. The "deploy" fabfile can be modified to perform almost any transfer task. Once the bootstrap process is complete, the directory structure will look like so:
home/
caktus/
www/
staging/
env/ -- virtual environment
bin/
include/
lib/ -- contains site-packages
source/ -- contains django src
caktus_website/
...
apache/
manage.py
requirements/
...
Now SSH to the server and run syncdb within the newly created virtual environment:
caktus@pike:~/www/staging/caktus_website$ source ../env/bin/activate
(env)caktus@pike:~/www/staging/caktus_website$ ./manage.py syncdb --settings=caktus_website.settings_staging
The staging setting's file is setup to use sqlite3 to simplify this deployment example. In practice we use PostgreSQL in our production environments, but database setup is for another blog post! To get Apache configured using mod_wsgi, we'll point the apache configuration to the staging.wsgi file using the WSGIScriptAlias directive. Here's an example Apache configuration to get a barebones Django environment up and running:
<VirtualHost:*80>
WSGIScriptReloading On
WSGIReloadMechanism Process
WSGIDaemonProcess caktus_website-staging
WSGIProcessGroup caktus_website-staging
WSGIApplicationGroup caktus_website-staging
WSGIPassAuthorization On
WSGIScriptAlias / /home/caktus/www/staging/caktus_website/apache/staging.wsgi/
<Location "/">
Order Allow,Deny
Allow from all
</Location>
<Location "/media">
SetHandler None
</Location>
Alias /media /home/caktus/www/staging/caktus_website/media
<Location "/admin-media">
SetHandler None
</Location>
Alias /admin-media /home/caktus/www/staging/caktus_website/media/admin
ErrorLog /home/caktus/www/log/error.log
LogLevel info
CustomLog /home/caktus/www/log/access.log combined
</VirtualHost:*80>
We'll use Apache to serve static media (both local and admin media) and direct everything else to the Django instance through mod_wsgi. In order for the wsgi instance to be aware of our environment and project directory, we need to add the virtual environment's site-packages directory, the project directory to the python path, and tell Django which settings file to use by setting the DJANGO_SETTINGS_MODULE environment variable:
import os
import sys
import site
PROJECT_ROOT = os.path.dirname(os.path.dirname(os.path.dirname(__file__)))
site_packages = os.path.join(PROJECT_ROOT, 'env/lib/python2.6/site-packages')
site.addsitedir(os.path.abspath(site_packages))
sys.path.insert(0, PROJECT_ROOT)
os.environ['DJANGO_SETTINGS_MODULE'] = 'caktus_website.settings_staging'
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()
Now just upload the staging apache configuration and reload apache:
(caktus)copelco@montgomery:~/caktus_website$ fab staging update_apache_conf
That's it! The site should be up and running on your server's public IP. If you run into any trouble (like a 500 Internal Server Error), just tail the Apache error.log, it'll usually point you in the right direction.
August 13 2009 by Colin Copeland
If you've ever tried to increase the shared_buffers setting in your postgresql.conf to a value that exceeds the amount of shared memory supported by your operating system kernel, then you'll see an error message like this:
copelco@montgomery:~$ /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data
2009-07-10 10:14:04 EDTFATAL: could not create shared memory segment: Invalid argument
2009-07-10 10:14:04 EDTDETAIL: Failed system call was shmget(key=5432001, size=142516224, 03600).
2009-07-10 10:14:04 EDTHINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter. You can either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce the request size (currently 142516224 bytes), reduce PostgreSQL's shared_buffers parameter (currently 16384) and/or its max_connections parameter (currently 23).
If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter, in which case raising the request size or reconfiguring SHMMIN is called for.
The PostgreSQL documentation contains more information about shared memory configuration.
The shared_buffers default value is low (for legacy reasons). If you increase it, PostgreSQL may request a shared memory segment that exceeds your kernel's SHMMAX paramter. You can see the current values like so:
copelco@montgomery:~$ sysctl kern.sysv.shmmax
kern.sysv.shmmax: 4194304
copelco@montgomery:~$ sysctl kern.sysv.shmall
kern.sysv.shmall: 1024
17.4. Managing Kernel Resources outlines methods to set the values permanently, but you can play around with the values temporarily (until restart) on the command line like so:
copelco@montgomery:~$ sudo sysctl -w kern.sysv.shmmax=1073741824
kern.sysv.shmmax: 4194304 -> 1073741824
copelco@montgomery:~$ sudo sysctl -w kern.sysv.shmall=1073741824
kern.sysv.shmall: 1024 -> 1073741824
Once you have working values, you can fire up PostgreSQL (I've been happy with the kyngchaos distribution) with a LaunchDaemon file and launchd:
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>org.postgresql.postgres</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/pgsql/bin/postmaster</string>
<string>-D</string>
<string>/usr/local/pgsql/data</string>
</array>
<key>RunAtLoad</key>
<true></true>
<key>UserName</key>
<string>copelco</string>
</dict>
</plist>
And the launchd commands:
copelco@montgomery:~$ sudo launchctl unload /Library/LaunchDaemons/org.postgresql.postgres.plist
copelco@montgomery:~$ sudo launchctl load /Library/LaunchDaemons/org.postgresql.postgres.plist
May 26 2009 by Colin Copeland
By default, Django doesn't do explicit table locking. This is OK for most read-heavy scenarios, but sometimes you need guaranteed, exclusive access to the data. Caktus uses PostgreSQL in most of our production environments, so we can use the various lock modes it provides to control concurrent access to the data. Once we obtain a lock in PostgreSQL, it is held for the remainder of the current transaction. Django provides transaction management, so all we need to do is execute a SQL LOCK statement within a transaction, and Django and PostgreSQL will handle the rest.
Below is an example decorator we came up with to provide easy table-locking access in Django:
from django.db import transaction
LOCK_MODES = (
'ACCESS SHARE',
'ROW SHARE',
'ROW EXCLUSIVE',
'SHARE UPDATE EXCLUSIVE',
'SHARE',
'SHARE ROW EXCLUSIVE',
'EXCLUSIVE',
'ACCESS EXCLUSIVE',
)
def require_lock(model, lock):
"""
Decorator for PostgreSQL's table-level lock functionality
Example:
@transaction.commit_on_success
@require_lock(MyModel, 'ACCESS EXCLUSIVE')
def myview(request)
...
PostgreSQL's LOCK Documentation:
http://www.postgresql.org/docs/8.3/interactive/sql-lock.html
"""
def require_lock_decorator(view_func):
def wrapper(*args, **kwargs):
if lock not in LOCK_MODES:
raise ValueError('%s is not a PostgreSQL supported lock mode.')
from django.db import connection
cursor = connection.cursor()
cursor.execute(
'LOCK TABLE %s IN %s MODE' % (model._meta.db_table, lock)
)
return view_func(*args, **kwargs)
return wrapper
return require_lock_decorator
This is, by no means, a perfect solution. Feel free to comment below.
January 13 2009 by Tobias McNulty
Here at Caktus, we use the popular Django web framework for a lot of our custom web application development. We don't use Django simply because it's popular, easy to learn, or happened to be the first thing we found. We've written web apps in PHP, Java, and Ruby on Rails--all before we discovered Django--but were never quite satisfied. Following are just a few of the reasons that we both enjoy working with Django and believe it gives you (the client) the best end-product.
Django is Business-Friendly
Django is open source, free, and published under a "do anything you like" license, so it can be used to create all kinds of products, including proprietary business web apps. In addition to a flexible license, Django has a truly thriving user community and is being constantly improved by web developers like ourselves across the globe.
Built-in Admin Interface
Web application development often starts with the "data model." A data model defines the ways in which all the different pieces of information--such as customer names and addresses or product descriptions--are organized and related to each other in the database. Finding the right data model takes time and it's important to get it right, because a lot of development decisions will be based on the way your information is organized and accessible. When you're building a web application from the ground up--something we do every day at Caktus--you want the flexibility to experiment with your data model and "see" what all the different options look like.
This is where Django's built-in admin interface comes in. From the beginning, Django has included an automatically generated interface that lets you see and edit what's in your database. It knows the structure of your data and puts together a set of search and listing pages and custom web forms for creating, modifying, and otherwise managing your data. It lets you evaluate your data model up front before making a big investment in other parts of your web app. For some sites, the admin interface even makes up a big part of the final product (e.g., for sites that primarily publish content, such as news organizations). And, we've found, the automatically generated admin interface is a powerful tool for showing potential clients what a web app can do.
I Trust Django With My Data
At Caktus we put a strong emphasis on "data integrity." What is data integrity? Kevin already wrote a great post about what it is and why you should care about data integrity. In a nutshell, the "integrity" of data refers to its "completeness" or validity as a whole. For example, you probably want to limit the products that people can order on your web site to those that you actually stock in the warehouse.
Modern "relational database management systems" provide integrity "checks" for your data that verify its appropriateness--based on the conditions you supply--for a given table in the database. When you build a data model in Django, you specify the nature or "type" of each column in your database and can even specify "constraints" on the data that--if your database server supports it--will be enforced at the database level in addition to the application. While this is always a good thing, it's even more important if other programs or users will be connecting to your database in addition to your web app. While Django does this out of the box, another popular web framework requires some under the hood "hacking" to achieve the same peace of mind about your data.
On a side note, in addition to preferring Django for web app development, Caktus also prefers PostgreSQL for data storage. Our friends over at Summersault have already written a good summary describing why PostgreSQL is often the best choice for web app development, so I won't repeat the reasons here. We trust the Django + PostgreSQL combination so much that we even wrote our own CRM and bookkeeping package to keep track of our clients, projects, and all the related financial transactions.
Django is Written in Python
Python is a great language with no shortage of facilities and a huge (and growing) user base. A lot of Google's infrastructure is written in Python, and it is the only language supported by the initial release of their App Engine service. According to python.org:
[Python] offers strong support for integration with other languages and tools, comes with extensive standard libraries, and can be learned in a few days. Many Python programmers report substantial productivity gains and feel the language encourages the development of higher quality, more maintainable code.
Based on Caktus' experience writing Django web apps over the past 1.5+ years, this couldn't be more true.
Separation of Application Components
Django uses a variation of the Model View Controller (MVC) architecture that ensures all the different pieces of your application end up in the right place and, for larger projects, let the people with different skills work on the things they do best, without getting in each other's way. Moreover, Django implements its own very simple "template language" for generating web pages. While some may view its simplicity as a curse, it is actually a blessing in disguise: by allowing only very simple constructs in the template, Django forces you to keep your business logic in the controller (what Django calls a "view") where it belongs.
At Caktus, we're not just web developers. We're web engineers with a passion for web apps that not only work, feel, and look great, but also have the capacity to grow, improve, and continue to perform long into the future without breaking the bank. That said, we're truly thrilled about the Python/Django + PostgreSQL combination.
January 07 2009 by Colin Copeland
Caktus released minibooks (open-sourced under the AGPL) as a bookkeeping package for small tech agencies. Boasting a double-entry accounting system, customer relationship management (CRM) and transaction reconciliation, minibooks provides a clean, multiuser web-based interface to manage simple accounting needs for small businesses.
minibooks was originally developed out of our frustration with single-user, desktop-oriented accounting packages like QuickBooks. We wanted a team-accessible system, so everyone on our VPN could access the CRM and manage bookkeeping tasks remotely. So Caktus developed a lightweight web app using Django and PostgreSQL to handle our basic needs. We use it everyday at Caktus, so we're continually improving it and adding features (most recently recurring transactions and flags to monitor delivered and undelivered exchanges) to make things easier.
I'd like to spend some time highlighting a few of the many great features found in minibooks:
ExchangeType Model
Invoices, Receipts, Orders, Purchases, etc., are represented by a single, generalized Exchange model in minibooks. At a very low level, they all do the same thing (record an exchange between two entities), but vary slightly in their characteristics. So, they all reside in the same database table and are distinguished by their type. Exchanges have a foreign key relationship to the ExchangeType model.
ExchangeTypes provide a powerful interface to streamline repetitive tasks and customize interface elements based sets of characteristics. For example, if most purchases are made through your credit card, you can create a Purchase ExchangeType to credit your Credit Card Account by default while still allowing you to debit each item on the receipt to a separate account (e.g., Food Expenses, Office Supplies, etc.). Invoices can also use this system with accounts receivable (accrual accounting). Simply set up an Invoice ExchangeType with Accounts Receivable as the common account and separate income credit accounts (e.g. Consulting, Hosting, etc) will be available for each item on the invoice.
Quick Search

minibooks' CRM stores all of the contacts, businesses and projects associated with Caktus. This way everything is consolidated in one spot ---from a client phone number to a project invoice. Being able to jump quickly is a necessity, so we created an AJAX auto-complete search bar that's displayed at the top of every page. Need to enter a receipt from the client meeting at the coffee shop or quickly find a phone number? Just type the first few letters and minibooks will search project, business and customer names and emails and return a list of matches instantly. Then just arrow down and hit return or click the one you're looking for. Better yet, the Quick Search field is accessible through the "f" access key, so hit Control+Alt+F on a Mac or Shift+Alt+F on Linux (and Windows?) and you'll never have to leave the keyboard!
Transaction Reconciliation

Balancing your checkbook can be a slow and tedious process, so we tried to ease the process with your business accounts. When a monthly credit card statement arrives, jump over to the Accounts tab and go to the credit card ledger. Here you'll find all credits (purchases) and debits (payments) ordered by date alongside a running total. Once you OK the amount on the credit card statement, check the checkbox next to that transaction and, using AJAX, minibooks will update the current reconciled balance! Now you can make sure your bookkeeping records always match your bank and credit card statements with ease.
Recurring Exchanges
Caktus provides web and email hosting services for our clients, so we wanted an easy way to automatically generate invoices for recurring services. The exchange creation form has the option to repeat items based on a set interval (days, weeks, months and years). Just setup cron to access /ledger/cron/ every night and all of your recurring exchanges will be automatically generated. An itemized email of generated invoices and other exchanges will be sent to you as well.
LaTeX Exchange and Report Generation
minibooks uses LaTeX to generate PDFs for exchanges (Invoices, Receipts, Member Distributions, etc.) and project reports. LaTeX files fit nicely into the Django template system, so you can use django variables, tags and filters right in your .tex document. minibooks includes the template we use here at Caktus by default for you to use, but the possibilities are endless, so feel free to create your own and style your invoices anyway you like! Further, the CRM automatically attaches the PDFs when sending exchanges to clients through minibooks.
Learn More
These are just some of the features found in minibooks. The source code and installation instructions can be found on the minibooks project page. An online demo (login:demo@minibooksdemo.com, pass:demo) is available for testing as well. minibooks is far from feature complete, so feel free to hack away at it and add features as you see fit!
February 22 2008 by Kevin Hunter
One problem with marketing is that it introduces pseudo-false concepts, arbitrarily divorces necessarily wed ones, and leaves out all the gory details. We recently had a client ask us,
Why would we do our data model/data base outside of MySQL and not have our software make calls upon it? I thought one used MySQL to manage the data and the database, and used software to call, update and display it …
The answer, of course, is "it depends." However, given the context of the situation with our client, whose project absolutely mandated a database, this highlighted a couple of large misconceptions. The first and foremost is that a data model and database are the same thing. They are not. A data model is a description of how a system will handle data. A database is how a system enforces that description. The difference is subtle, but significant.
For example, if I put 500 USD into my bank account, I expect that when I want it later, the funds are still there. But what if I made a mistake and put a 1 instead of a 5 for the account number? Or what if the teller can't read my handwriting and puts down a 5 instead of 6? What's the guard to make sure that I won't lose my money to some lucky schmo? In this hypothetical example, the bank database would see my name on the transaction and raise a flag that it didn't match the number the teller typed. That's the job of the database, to make sure that the constituent (me) is correctly tied to related items (500 USD, account number, name). If I had just put money in and there was no database, there would just be one record among millions that showed that I gave some random account a gift. Good luck proving that I did not mean to do that.
But let's return to "it depends." It depends on what one needs to do with the data. If one merely needs a log or list of data produced, one will not need the complexity of a Relational Database Management System (RDBMS). On the other hand, the minute one wants to do something fun, like see who accessed two different pages of a site on the 1st and 5th of each month, between the hours of 3p and 8p from certain departments of 4 different companies, one wants an RDBMS. As the name "Relational" implies, a well designed data model in the hands of an RDBMS makes extremely difficult and perhaps random questions easy, or at least possible.
The second misconception lay in his confusion about keeping his project outside of the database. In his mind at the time, MySQL was it. Similar to how most people don't know that Microsoft Windows is a choice, he thought MySQL was the only database. When he heard us drop words like "SQLite," "Postgres," and "datastore" around, he was confused. There exist today a plethora of databases, each targeting a segment of the market and each rife with their own strengths and weaknesses. To name drop a few: DB2, Derby, Firebird, MySQL, Oracle, Postgres, Sqlite. There are others, too (hint: google for 'database comparison wikipedia'). Each of these has a specific market segment they target. I won't get into it here, but knowing the right one to pick for your needs is difficult at best.
Overall, I absolutely love databases. They are supremely excellent beasts. Properly designed, databases help your company enforce all the nitty-gritty details, and pull all kinds of interesting information from your data. However, like any tool, use it for the jobs at which it excels and not for the jobs at which fails. If all you need is a storage device, use a simple text file or spreadsheet. If you need long-term data integrity and powerful information scraping capabilities, absolutely use a database.