Posts Tagged ‘neighborhoods’

Betashapes for San Francisco neighborhoods

Sunday, August 21st, 2011

[Editor’s note: A first for me, read this post in Romanian/Ukrainian cyrillic. Thanks Maria!]

First results in using the betashapes script from Schuyler Erle and Melissa Santos. Still some kinks for me to work out relating understanding how the script deals with donut holes and scrubbing the list of Yahoo GeoPlanet neighborhood names in the input. For the US, can just use Census block polygons and avoid OSM copyright funk. Use GeoPDFs on the iPhone and other iOS devices to see yourself in the map, get Avenza’s PDF Maps.app. More images after the jump. Click on an image to see it larger.

More background: Betashapes are based on how people tag (and only if they also geotag) their Flickr photos. The script queries Flickr for photos for specified neighborhood tags (up to 2,500 sample size each ‘hood, these SF neighborhoods calculated from ~250,000 photo locations) and counts up what neighborhood tag is dominant in any city block and then aggregates them into neighborhoods. The neighborhood names and ids are from the Yahoo! GeoPlanet database that has a mix of real and fanciful (minority report) places. If you remove the ones you don’t agree with from the input, they will be ignored on the output. Lots to refine here…

View GeoPDF »
Download betashapes shapefile for SF »
Download SF geodata ingredients »
Download Y! GeoPlanet SF files »

(below) Betashapes using Flickr images and city street grid turned into polygon blocks.
sf_neighborhoods_betashapes

(more…)

Flickr Shapefiles browser (MapToPixel)

Wednesday, June 16th, 2010

europe-300x222

[Editor’s note: Also check out Aaron’s WOE ID browser (the geography behind Flickr). The Flickr API returns both ESRI format shapefiles and XML / JSON. The monster dump of all Flickr shapes is just XML, however. Thanks GeoPDX!]

Republished from MapToPixel.

Flickr Shapefiles are a set of polygons generated from the geo-tags of photos on Flickr. Using the names assigned by people to their own images the dataset offers boundaries of loads of places around the world. The code.flickr blog has more info and details of their generation. The idea is that using people’s tags of locations to form boundaries gives a large dataset of where people think particular places are.

The Boundaries project uses Flick Shapefiles to show neighbourhoods and their neighbouring places. Other than that there isn’t a huge amount of examples on the web.  I’ve put together an example that uses ModestMaps and the Flickr API to display the Shapefiles in Flash. The polygons are retrieved using a bounding box query to the Flickr API, decoded from JSON, drawn and may be identified with a mouse hover.

Continue reading at MapToPixel . . .

Travellr: Behind the Scenes of our Region-Based Clusters (Google GeoDev)

Monday, July 6th, 2009

[Editor’s note: The age-old rule for cloropleth mapping that suggests aggregation by multi-scale areal units based on the map’s zoom level is slowly seeping into “clustering” for the point-based mashup geo community. This overview from Travellr published on the Google GeoDevelopers blog includes two illustrations that show the power of this technique. I used such a technique (different implementation) on The Washington Post’s recent swine flu mapping.]

Republished from Google GeoDevelopers Blog.
Monday, June 22, 2009

Recently, there has been a lot of interest in clustering algorithms. The client-side grid-based MarkerClusterer was released in the open source library this year, and various server-side algorithms were discussed in the Performance Tips I/O talk. We’ve invited the Travellr development team to give us insight on their unique regional clustering technique.

Travellr is a location aware answers service where people can ask travel-related questions about anywhere in the world. One of its features is a map-based interface to questions on the site using Google Maps.

dxjmnbf_4t5qkqpfw_b1
Figure 1. An example of the Travellr Map, showing question markers for Australia.

Clustering for usability
We learned that the best way to display markers without cluttering our map was to cluster our questions depending on how far you zoom in. If the user was looking at a map of the continents, we would cluster our questions into a marker for each continent. If the user zoomed-in to France we would then cluster our questions into a marker for each region or city that had questions. By clustering our data into cities, regions/states, countries, and continents, we could display relevant markers on the map depending on what zoom level the user was looking at.

Optimizing for Clustering
Our next challenge was how to extract clustered data from our database without causing excessive server load. Every time the user pans and zooms on the map, we need to query and fetch new clustered data in order to display the markers on the map. We also might have to limit the data if the user has selected a tag, as we’re only interested in a questions related to a topic (ie: “surfing”). To execute this in real-time would be painstakingly slow, as you would need to to cluster thousands of questions in thousands of locations with hundreds of tags on the fly. The answer? Pre-cluster your data of course!

Step 1. Structure your location data
When a question is asked about a city on Travellr, we also know its region/state, country and continent. We store more than 55,000 location points as a hierarchy, with each location “owning” its descendent nodes (and all of their data). Our locations are stored in a Modified Preorder Tree (also called Nested Sets). Modified Preorder Trees are a popular method of storing hierarchical data in a flat database table, having a focus on efficient data retrieval, and easy handling of sub trees. For each location we also keep a record of its depth within the tree, its location type (continent, country, region/state, or city), and its co-ordinates (retrieved using the Google Maps geocoder).

Step 2. Aggregate your data
We calculate aggregate data for every branch of our locations tree ahead of time. By storing aggregate data for cities, regions/states, countries, and continents, we provide an extremely fast and inexpensive method to query our locations database for any information regarding questions asked about a particular location. This data is updated every few minutes by a server-side task.

Our aggregations include:

  • Total question count for a location
  • Most popular tags for that location
  • Number of questions associated with each of those tags.

How we query our structured, aggregate data on the map
Whenever the user zooms or pans the map we fire off a query to our (unpublished ;) API with the tags they are searching for, the current zoom level, and the edge co-ordinates of the map’s bounding box. Based on the zoom level (Figure 2) we work out whether we want to display markers for continents, countries, states, or cities. We then send back the data for these markers and display them on the map.

dc287ncr_29cb84v7ct_b
Figure 2. Clustering at different zoom levels (blue = continents, countries, pink = states, cities)

Everyone Wins
So what is the result of structuring and aggregating our data in such a way? It means that we have nicely organized, pre-clustered data that can be read from cheaply and easily. This allows us to provide a super-fast map interface for Travellr that puts minimal load on our infrastructure. Everyone is happy!

Comments or Questions?
We’d love to hear from you if you have any questions on how we did things, or suggestions or comments about Travellr’s map. This article was written by Travellr’s performance and scalability expert Michael Shaw (from Insight4) and our client-side scripting aficionado Jaidev Soin.

You can visit Travellr at www.travellr.com, or follow us on Twitter at twitter.com/travellr.