<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Scraping on Caktus Group</title><link>https://www.caktusgroup.com/tags/scraping/</link><description>Recent content in Scraping on Caktus Group</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 16 Dec 2011 13:30:19 +0000</lastBuildDate><atom:link href="https://www.caktusgroup.com/tags/scraping/index.xml" rel="self" type="application/rss+xml"/><item><title>OpenBlock Geocoder, Part 2: Text Parsing and Entity Extraction</title><link>https://www.caktusgroup.com/blog/2011/12/16/openblock-geocoder-part-2-text-parsing-and-entity-extraction/</link><pubDate>Fri, 16 Dec 2011 13:30:19 +0000</pubDate><guid>https://www.caktusgroup.com/blog/2011/12/16/openblock-geocoder-part-2-text-parsing-and-entity-extraction/</guid><description>&lt;p>This is the second post in our &lt;a href="https://github.com/openrural" target="_blank" rel="noopener noreferrer">OpenRural&lt;/a>
series reviewing &lt;a href="http://openblockproject.org/" target="_blank" rel="noopener noreferrer">OpenBlock&lt;/a> and it's
geocoder. &lt;a href="http://www.caktusgroup.com/blog/2011/12/12/openblock-geocoder-part-1-data-model-and-geocoding/" target="_blank" rel="noopener noreferrer">OpenBlock Geocoder, Part 1: Data Model and
Geocoding&lt;/a>
covers the internals of the OpenBlock geocoder and it's geocoding
capabilities. As this posts builds upon topics covered there, you may
wish to read Part 1 before proceeding. In this post we step back from
the internals of the geocoder and explore how to use it along with other
OpenBlock tools to parse unstructured text.&lt;/p></description></item><item><title>Scraping Data and Web Standards</title><link>https://www.caktusgroup.com/blog/2011/12/06/scraping-data-and-web-standards/</link><pubDate>Tue, 06 Dec 2011 21:00:00 +0000</pubDate><guid>https://www.caktusgroup.com/blog/2011/12/06/scraping-data-and-web-standards/</guid><description>&lt;p>We're currently involved in a project with the &lt;a href="http://jomc.unc.edu/" target="_blank" rel="noopener noreferrer">UNC School of
Journalism&lt;/a> that
hopes to help rural newspapers in North Carolina leverage
&lt;a href="http://openblockproject.org/" target="_blank" rel="noopener noreferrer">OpenBlock&lt;/a>.  The project is
called OpenRural, and if you're a software developer you can find the
latest code &lt;a href="https://github.com/openrural/openrural-nc" target="_blank" rel="noopener noreferrer">on
GitHub&lt;/a>.&lt;/p></description></item></channel></rss>