Quantcast
Viewing latest article 2
Browse Latest Browse All 6

Tulsa Data Platform options

Image may be NSFW.
Clik here to view.
Tulsa Data

This weekend I’ve been hacking on one of the data ideas we’ve had – scraping the Tulsa Health Department’s Restaurant Inspection data. I’m evaluating a few options for Open Data site/hosting. I’m posting my evaluations here in the hopes they’ll be useful for anyone else trying to do something similar. I’ve got a basic chart comparison and details below.

ScraperWiki DataCouch BuzzData Socrata
Open-source Yes Yes No No
Hosting Cloud or Self Cloud or Self Cloud Cloud
Data Licensing Any (free-form) ? Creative Commons Creative Commons, Public Domain
Data Formats IN Anything with a URL csv, json .csv, .tsv, .xls files .csv, .tsv, .xls files
Data Formats OUT csv, json, html, rss csv, json source csv, txt, json, xml, rdf, xls, pdf
Project Maturity Stable Pre-Alpha Stable “Enterprise”

DataCouch

VERY unstable. A couple of Tulsa Web Devs have tried to set it up without any luck. Even the datacouch.com site itself seems to go up and down or sometimes features don’t work. E.g., right now the Twitter sign-in is broken so I can’t even tell what the data licensing is.

BuzzData

BuzzData seems more like a social site for sharing data files – i.e., no url’s for data sources, nor for the data you publish on the site. It features data history, additional attachments, links to articles and visualizations, collaborators, and followers for each dataset. It seems to fit academic and research collaboration better than development.

Socrata

Socrata seems like the 800lb Gorilla of data platforms. It also uses files instead of data in http request/response, so it’s less useful as a data source for developers of applications. Socrata seems like the solution we could pitch to city agencies if we ever convince them to open and publish data themselves. They have a “Socrata for Government” white-paper and everything.

ScraperWiki

ScraperWiki is my favorite. It’s an open-source django web app, but it has lots of additional pieces – which make the initial set up a little hard. (The ScraperWiki installation instructions has some gaps too.) My favorite features:

  • It hosts both the scraper code AND the resulting data. (They gave us a scraper template that lets you host scraper code as a github gist – or you could host your code anywhere that’s url-accessible I suppose.)
  • Scraper code can be python, ruby, php, or javascript, with lots of scraping libraries for each. (especially python!)
  • Source data can be anything that’s url-accessible; lots of output formats.
  • It has features for both data developers AND data users – journalists, researchers, app developers – including “request data” (bonus: requests for non-public or non-open data are paid services), and a “get involved” dashboard.

So, I set up my own ScraperWiki server. But I still have some things to iron out – need to set up a mail server and need to find out why the scraper editor doesn’t work correctly. I’m having a skype call with some dev’s from ScraperWiki so maybe they can help out. Or, we might end up putting our data on scraperwiki.com if we can host our scrapers on github. We’ll see …


Viewing latest article 2
Browse Latest Browse All 6

Trending Articles