An updated version of the Flickr shapefile public dataset (2.0) was released last week. From nils official post:
… We haven’t completely forgotten about shapefiles and have finally gotten around to generating a new batch (read about Alpha Shapes to find out how it’s done). When Aaron did the first run we had somewhere around ninety million (90M) geotagged photos. Today we have over one hundred and ninety million (190M) and that number is growing rapidly. Of course lots of those will fall within the boundaries of the existing shapes and won’t give us any new information, but some of them will improve the boundaries of old shapes, and others will help create new shapes where there weren’t any before. Version 1 of the dataset had shapes for around one hundred and eighty thousand (180K) WOE IDs, and now we have shapes for roughly two hundred and seventy thousand (270K) WOE IDs. Woo. The dataset is available for download today, available for use under the Creative Commons Zero Waiver.
True to it’s claim, the version 2.0 release brings added fidelity on existing shapes (they are becoming more conformal to the features’ true geographic shape as more human sensors perambulate) and surveys some more cities and significantly more neighborhoods. From a data analytics perspective, I wish the new version had the summary photo count and centroid XY per feature of the 1.0 version. But very excited to see a new version released! Image above by Aaron Straup Cope. More coverage of things Flickr on Kelso’s Corner »
While the dataset is distributed in GeoJSON format, that isn’t accessible to everyone so I’ve mirrored an ESRI Shapefile version of the Flickr Shapefile Public Dataset 2.0 with this blog post (~60 mb). Details on how I did the conversion after the jump.
Make sure GDAL (or whatever library you’ve installed that bundles ogr2ogr) is include in your PATH after install. From Terminal.app, run:
Once you’re setup, the conversion command in Terminal.app takes the following form:
ogr2ogr -f “output file type” output_filename input_filename
ogr2ogr -f “ESRI Shapefile” /Volumes/Data/Downloads/flickr_shapes_public_dataset_2.0/flickr_shapes_neighbourhoods.shp /Volumes/Data/Downloads/flickr_shapes_public_dataset_2.0/flickr_shapes_neighbourhoods.geojson
Errors generated during the conversion:
Warning 6: Normalized/laundered field name: ‘place_type_id’ to ‘place_ty_1′
Warning 6: Normalized/laundered field name: ‘superseded_by’ to ‘superseded’
Extra errors: Please note that OGR2OGR burps on UTF to Win1252 (the encoding ESRI shapefiles use) character conversion so some accented characters are giberish in this quick and dirty conversion. You’re mileage will vary county by country (worst in Asia, Iceland, etc.).
The only difference between these files and the GeoJSON is the new “name” column for each theme. While the originals have a “label” column, it is composed of the full “Eureka, California, United States of America” name space. The added “name” column contains the “first” of those, making it easier to auto label the features in your favorite GIS app.