Tulsa Data Platform options

Image may be NSFW.
Clik here to view. Tulsa Data

This weekend I’ve been hacking on one of the data ideas we’ve had – scraping the Tulsa Health Department’s Restaurant Inspection data. I’m evaluating a few options for Open Data site/hosting. I’m posting my evaluations here in the hopes they’ll be useful for anyone else trying to do something similar. I’ve got a basic chart comparison and details below.

	ScraperWiki	DataCouch	BuzzData	Socrata
Open-source	Yes	Yes	No	No
Hosting	Cloud or Self	Cloud or Self	Cloud	Cloud
Data Licensing	Any (free-form)	?	Creative Commons	Creative Commons, Public Domain
Data Formats IN	Anything with a URL	csv, json	.csv, .tsv, .xls files	.csv, .tsv, .xls files
Data Formats OUT	csv, json, html, rss	csv, json	source	csv, txt, json, xml, rdf, xls, pdf
Project Maturity	Stable	Pre-Alpha	Stable	“Enterprise”

DataCouch

VERY unstable. A couple of Tulsa Web Devs have tried to set it up without any luck. Even the datacouch.com site itself seems to go up and down or sometimes features don’t work. E.g., right now the Twitter sign-in is broken so I can’t even tell what the data licensing is.

BuzzData

BuzzData seems more like a social site for sharing data files – i.e., no url’s for data sources, nor for the data you publish on the site. It features data history, additional attachments, links to articles and visualizations, collaborators, and followers for each dataset. It seems to fit academic and research collaboration better than development.

Socrata

Socrata seems like the 800lb Gorilla of data platforms. It also uses files instead of data in http request/response, so it’s less useful as a data source for developers of applications. Socrata seems like the solution we could pitch to city agencies if we ever convince them to open and publish data themselves. They have a “Socrata for Government” white-paper and everything.

ScraperWiki

ScraperWiki is my favorite. It’s an open-source django web app, but it has lots of additional pieces – which make the initial set up a little hard. (The ScraperWiki installation instructions has some gaps too.) My favorite features:

It hosts both the scraper code AND the resulting data. (They gave us a scraper template that lets you host scraper code as a github gist – or you could host your code anywhere that’s url-accessible I suppose.)
Scraper code can be python, ruby, php, or javascript, with lots of scraping libraries for each. (especially python!)
Source data can be anything that’s url-accessible; lots of output formats.
It has features for both data developers AND data users – journalists, researchers, app developers – including “request data” (bonus: requests for non-public or non-open data are paid services), and a “get involved” dashboard.

So, I set up my own ScraperWiki server. But I still have some things to iron out – need to set up a mail server and need to find out why the scraper editor doesn’t work correctly. I’m having a skype call with some dev’s from ScraperWiki so maybe they can help out. Or, we might end up putting our data on scraperwiki.com if we can host our scrapers on github. We’ll see …

Image may be NSFW.
Clik here to view.

Tulsa Data Platform options

DataCouch

BuzzData

Socrata

ScraperWiki

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112

DataCouch

BuzzData

Socrata

ScraperWiki

Related articles

Trending Articles