December 28, 2011
by Colin Copeland
0 comments
Categories:
Technical

OpenBlock Geocoder, Part 3: External Geocoders

The OpenBlock geocoder is powerful and robust. It uses PostGIS for spacial queries, can extract addresses from bodies of text, and can understand block and intersection notation. We've run into a few issues with it, however, including a low geocoding success rate. This is a tough problem to solve and depends on a lot of factors (the extent of street and block data in OpenBlock, format of the street addresses, etc.), so your mileage may vary. Below I constructed a simple test using Google's Geocoding API to have as an alternative.

Disclamer: This is the third post in our OpenRural series reviewing OpenBlock and it's geocoder. You may wish to read Part 1: Data Model and Geocoding and Part 2: Text Parsing and Entity Extraction before proceeding.

Adding news with OpenBlock's geocoder

The Schema and NewsItem models provide OpenBlock with a generic data model to associate news with geographic locations. You can find a fairly extensive introduction in the official documentation, so we won't go into too much detail here.

Since a NewsItem requires a geographic point, let's use the OpenBlock geocoder to find 123 East Franklin Street:

>>> from ebpub.geocoder import SmartGeocoder
>>> geocoder = SmartGeocoder()
>>> location_name = '123 East Franklin Street'
>>> point = geocoder.geocode(location_name)['point']
>>> point.wkt
'POINT (-79.0553588124999891 35.9133110937499964)'
You'll notice that point has a wkt attribute. wkt, or Well-known text, is a text markup language for representing geometry objects. Here we have a POINT, but the language can represent many geometries, including LineString and Polygons.

We'll use the "Local News" schema in this example as it is pre-loaded in OpenBlock:

>>> from ebpub.db import models as ebpub
>>> schema = ebpub.Schema.objects.get(name='Local News')

Using this schema, we'll add a new NewsItem with the point created above:

>>> import datetime
>>> news = schema.newsitem_set.create(
...     title='Incident downtown',
...     description='Something happend downtown today!',
...     item_date=datetime.date.today(),
...     location=point,
...     location_name=location_name,
... )
>>> news.location.wkt
'POINT (-79.0553588124999891 35.9133110937499964)'

That was easy. Now we have a NewsItem that OpenBlock is aware of and can be plotted on a map. However, what do we do if we can't geocode the address?

Using an external geocoder

If we already have a geographic point, then we can circumvent the geocoder entirely:

>>> from django.contrib.gis.geos import Point
>>> manual_point = Point(-79.0553588124999891, 35.9133110937499964)
>>> news = schema.newsitem_set.create(
...     title='Incident downtown',
...     description='Something happend downtown today!',
...     item_date=datetime.date.today(),
...     location=manual_point,
...     location_name=location_name,
... )
>>> news.location.wkt
'POINT (-79.0553588124999891 35.9133110937499964)'

This means we can also use an external geocoder. For example, we can use Google's Geocoding API with geopy. First, you'll need a Google Maps API key, which we'll use with geopy:

>>> GOOGLE_MAPS_API_KEY = '' # your Google Maps API key

Then we can use geopy to construct a new geocoder:

>>> from geopy import geocoders
>>> g = geocoders.Google(GOOGLE_MAPS_API_KEY)

And we can geocode our address:

>>> address = '123 East Franklin Street, Chapel Hill, NC'
>>> place, (lat, lng) = g.geocode(address)
>>> point = Point(lng, lat)
>>> point.wkt
'POINT (-79.0549350000000004 35.9136495999999994)'

You can even tap into OpenBlock's internals and build a Geocoder that OpenBlock can use:

from django.conf import settings
from django.contrib.gis.geos import Point

from geopy import geocoders
from geopy.geocoders.google import GQueryError

from ebpub.geocoder import Geocoder, DoesNotExist


class GoogleGeocoder(Geocoder):

    def __init__(self, *args, **kwargs):
        kwargs['use_cache'] = False # haven't implemented cache yet
        super(GoogleGeocoder, self).__init__(*args, **kwargs)
        self.geocoder = geocoders.Google(settings.GOOGLE_MAPS_API_KEY)

    def _do_geocode(self, location_string):
        try:
            place, (lat, lng) = self.geocoder.geocode(location_string)
        except (GQueryError, ValueError), e:
            raise DoesNotExist(unicode(e))
        location = {'point': Point(lng, lat)}
        return location

This is an proof-of-concept geocoder we're using with OpenRural. You can find it on GitHub. Using this geocoder with a sample dataset from the North Carolina Secretary of State Corporation Filings, I was able to increase the geocoding success rate from about 37% to 95%. Again, your mileage will vary, but it can be useful to test out. We can't use Google's API for everything though. Normal users are limited to 2,500 requests per day. Business accounts are allotted 100,000 requests. Additionally, Google requires you to display any points geocoded with their API on a Google Map. So you'll need to evaluate your needs before deciding on using Google's API.

blog comments powered by Disqus