Posts Tagged ‘subgroup’

Travellr: Behind the Scenes of our Region-Based Clusters (Google GeoDev)

Monday, July 6th, 2009

[Editor's note: The age-old rule for cloropleth mapping that suggests aggregation by multi-scale areal units based on the map's zoom level is slowly seeping into "clustering" for the point-based mashup geo community. This overview from Travellr published on the Google GeoDevelopers blog includes two illustrations that show the power of this technique. I used such a technique (different implementation) on The Washington Post's recent swine flu mapping.]

Republished from Google GeoDevelopers Blog.
Monday, June 22, 2009

Recently, there has been a lot of interest in clustering algorithms. The client-side grid-based MarkerClusterer was released in the open source library this year, and various server-side algorithms were discussed in the Performance Tips I/O talk. We’ve invited the Travellr development team to give us insight on their unique regional clustering technique.

Travellr is a location aware answers service where people can ask travel-related questions about anywhere in the world. One of its features is a map-based interface to questions on the site using Google Maps.

Figure 1. An example of the Travellr Map, showing question markers for Australia.

Clustering for usability
We learned that the best way to display markers without cluttering our map was to cluster our questions depending on how far you zoom in. If the user was looking at a map of the continents, we would cluster our questions into a marker for each continent. If the user zoomed-in to France we would then cluster our questions into a marker for each region or city that had questions. By clustering our data into cities, regions/states, countries, and continents, we could display relevant markers on the map depending on what zoom level the user was looking at.

Optimizing for Clustering
Our next challenge was how to extract clustered data from our database without causing excessive server load. Every time the user pans and zooms on the map, we need to query and fetch new clustered data in order to display the markers on the map. We also might have to limit the data if the user has selected a tag, as we’re only interested in a questions related to a topic (ie: “surfing”). To execute this in real-time would be painstakingly slow, as you would need to to cluster thousands of questions in thousands of locations with hundreds of tags on the fly. The answer? Pre-cluster your data of course!

Step 1. Structure your location data
When a question is asked about a city on Travellr, we also know its region/state, country and continent. We store more than 55,000 location points as a hierarchy, with each location “owning” its descendent nodes (and all of their data). Our locations are stored in a Modified Preorder Tree (also called Nested Sets). Modified Preorder Trees are a popular method of storing hierarchical data in a flat database table, having a focus on efficient data retrieval, and easy handling of sub trees. For each location we also keep a record of its depth within the tree, its location type (continent, country, region/state, or city), and its co-ordinates (retrieved using the Google Maps geocoder).

Step 2. Aggregate your data
We calculate aggregate data for every branch of our locations tree ahead of time. By storing aggregate data for cities, regions/states, countries, and continents, we provide an extremely fast and inexpensive method to query our locations database for any information regarding questions asked about a particular location. This data is updated every few minutes by a server-side task.

Our aggregations include:

  • Total question count for a location
  • Most popular tags for that location
  • Number of questions associated with each of those tags.

How we query our structured, aggregate data on the map
Whenever the user zooms or pans the map we fire off a query to our (unpublished ;) API with the tags they are searching for, the current zoom level, and the edge co-ordinates of the map’s bounding box. Based on the zoom level (Figure 2) we work out whether we want to display markers for continents, countries, states, or cities. We then send back the data for these markers and display them on the map.

Figure 2. Clustering at different zoom levels (blue = continents, countries, pink = states, cities)

Everyone Wins
So what is the result of structuring and aggregating our data in such a way? It means that we have nicely organized, pre-clustered data that can be read from cheaply and easily. This allows us to provide a super-fast map interface for Travellr that puts minimal load on our infrastructure. Everyone is happy!

Comments or Questions?
We’d love to hear from you if you have any questions on how we did things, or suggestions or comments about Travellr’s map. This article was written by Travellr’s performance and scalability expert Michael Shaw (from Insight4) and our client-side scripting aficionado Jaidev Soin.

You can visit Travellr at, or follow us on Twitter at

CIA World Factbook Relation Browser (moritz.stefaner)

Wednesday, June 10th, 2009


[Editor's note: This interactive visualization browses the CIA World Factbook topology of geography by country. Featured edge relationships include neighboring countries, spoken language, and more.]

Republished from Moritz.Stefaner.

This radial browser was designed to display complex concept network structures in a snappy and intuitive manner. It can be used to visualize conceptual structures, social networks, or anything else that can be expressed in nodes and links.

The CIA Factbook demo displays the relations of countries, continents, languages and oceans found in the CIA world factbook database. Click the center node for detail information or click adjacent nodes to put them in the center. The arrows on the top left can be used to navigate your click history. Use the dropdown in the upper right to directly access nodes by name. The varying distance to the center node for nodes with many neighbors was only introduced to enhance legibility and does not have a special semantics.

Jump Starting the Global Economy (Wash Post)

Tuesday, April 7th, 2009


[Editor's note: Find the trends, group them together, and use that hierarchy (topology) as an access metaphor. And remember geography doesn't always need to mean map.]

Republished from The Washington Post.
Original publication date: March 29th, 2009.
By Karen Yourish And Todd Lindeman — The Washington Post.

The total amount of the stimulus packages approved by the G-20 countries amounts to $1.6 trillion. More than half of that comes from the United States.

Other maps and graphics that use grouping:

Disappearing Birds (Wash Post)

Monday, March 23rd, 2009

[Editor's note: "Habitat loss has sent many bird species into decline across the United States." This chart  shows the percent change in bird population since 1968, by habitat. I like three things about this chart: (1) it uses direct labeling on the green and red lines thus making it easy to understand for all and allowing color blind viewers access to the encoded information (see post) and (2) the chart segments out important thematic subtrends in the dataset. Also (3) I worked on a bird migration supplement (wall) map for National Geographic in 2004 and Cornell Lab of Ornithology has some of the coolest time-based mapping techniques around. See original artwork from the North America side of the supplement now thru May at NG Explorers Hall in DC.]

Republished from The Washington Post.
Graphic by Patterson Clark.  March 20, 2009.

Related story by Juliet Eilperin.

Major Decline Found In Some Bird Groups
But Conservation Has Helped Others

Several major bird populations have plummeted over the past four decades across the United States as development transformed the nation’s landscape, according to a comprehensive survey released yesterday by the Interior Department and outside experts, but conservation efforts have staved off potential extinctions of others.

“The State of the Birds” report, a broad analysis of data compiled from scientific and citizen surveys over 40 years, shows that some species have made significant gains even as others have suffered. Hunted waterfowl and iconic species such as the bald eagle have expanded in number, the report said, while populations of birds along the nation’s coasts and in its arid areas and grasslands have declined sharply.

From the report: “Reveals troubling declines of bird populations during the past 40 years—a warning signal of the failing health of our ecosystems. At the same time, we see heartening evidence that strategic land management and conservation action can reverse declines of birds. This report calls attention to the collective efforts needed to protect nature’s resources for the benefit of people and wildlife.”

Continue reading at The Washington Post . . .