Posts Tagged ‘continents’

Travellr: Behind the Scenes of our Region-Based Clusters (Google GeoDev)

Monday, July 6th, 2009

[Editor's note: The age-old rule for cloropleth mapping that suggests aggregation by multi-scale areal units based on the map's zoom level is slowly seeping into "clustering" for the point-based mashup geo community. This overview from Travellr published on the Google GeoDevelopers blog includes two illustrations that show the power of this technique. I used such a technique (different implementation) on The Washington Post's recent swine flu mapping.]

Republished from Google GeoDevelopers Blog.
Monday, June 22, 2009

Recently, there has been a lot of interest in clustering algorithms. The client-side grid-based MarkerClusterer was released in the open source library this year, and various server-side algorithms were discussed in the Performance Tips I/O talk. We’ve invited the Travellr development team to give us insight on their unique regional clustering technique.

Travellr is a location aware answers service where people can ask travel-related questions about anywhere in the world. One of its features is a map-based interface to questions on the site using Google Maps.

dxjmnbf_4t5qkqpfw_b1
Figure 1. An example of the Travellr Map, showing question markers for Australia.

Clustering for usability
We learned that the best way to display markers without cluttering our map was to cluster our questions depending on how far you zoom in. If the user was looking at a map of the continents, we would cluster our questions into a marker for each continent. If the user zoomed-in to France we would then cluster our questions into a marker for each region or city that had questions. By clustering our data into cities, regions/states, countries, and continents, we could display relevant markers on the map depending on what zoom level the user was looking at.

Optimizing for Clustering
Our next challenge was how to extract clustered data from our database without causing excessive server load. Every time the user pans and zooms on the map, we need to query and fetch new clustered data in order to display the markers on the map. We also might have to limit the data if the user has selected a tag, as we’re only interested in a questions related to a topic (ie: “surfing”). To execute this in real-time would be painstakingly slow, as you would need to to cluster thousands of questions in thousands of locations with hundreds of tags on the fly. The answer? Pre-cluster your data of course!

Step 1. Structure your location data
When a question is asked about a city on Travellr, we also know its region/state, country and continent. We store more than 55,000 location points as a hierarchy, with each location “owning” its descendent nodes (and all of their data). Our locations are stored in a Modified Preorder Tree (also called Nested Sets). Modified Preorder Trees are a popular method of storing hierarchical data in a flat database table, having a focus on efficient data retrieval, and easy handling of sub trees. For each location we also keep a record of its depth within the tree, its location type (continent, country, region/state, or city), and its co-ordinates (retrieved using the Google Maps geocoder).

Step 2. Aggregate your data
We calculate aggregate data for every branch of our locations tree ahead of time. By storing aggregate data for cities, regions/states, countries, and continents, we provide an extremely fast and inexpensive method to query our locations database for any information regarding questions asked about a particular location. This data is updated every few minutes by a server-side task.

Our aggregations include:

  • Total question count for a location
  • Most popular tags for that location
  • Number of questions associated with each of those tags.

How we query our structured, aggregate data on the map
Whenever the user zooms or pans the map we fire off a query to our (unpublished ;) API with the tags they are searching for, the current zoom level, and the edge co-ordinates of the map’s bounding box. Based on the zoom level (Figure 2) we work out whether we want to display markers for continents, countries, states, or cities. We then send back the data for these markers and display them on the map.

dc287ncr_29cb84v7ct_b
Figure 2. Clustering at different zoom levels (blue = continents, countries, pink = states, cities)

Everyone Wins
So what is the result of structuring and aggregating our data in such a way? It means that we have nicely organized, pre-clustered data that can be read from cheaply and easily. This allows us to provide a super-fast map interface for Travellr that puts minimal load on our infrastructure. Everyone is happy!

Comments or Questions?
We’d love to hear from you if you have any questions on how we did things, or suggestions or comments about Travellr’s map. This article was written by Travellr’s performance and scalability expert Michael Shaw (from Insight4) and our client-side scripting aficionado Jaidev Soin.

You can visit Travellr at www.travellr.com, or follow us on Twitter at twitter.com/travellr.

Gitmo In Limbo (Wash Post)

Wednesday, February 18th, 2009

[Editor's note: While President Obama has committed to closing the military prison at Guantanamo Bay within a year, it's hard to know what to do with some of the prisoners.

This graphic reminds me of the old adage about people being able to deal only 5±2 things at once. There are almost 200 countries in the world. It's hard to keep track of them all. But there are only 7 continents, and those are easy to remember because it fits the 5±2 rule. To instead of listing out all those countries alphabetically or ordered by number of detainees, sometimes it is more useful to group them first by geographic "region". Note: Washington Post style views the Middle East as a separate continent-level region from Asia. Thank also to Laris for formulating these ideas with me.

Why wasn't this information shown on a map instead of listed in a structured table with charting? For several reasons: Geography, while useful as an metaphorical principle, does not function as a the most important thematic (organizing) principle in the distribution. We know nothing about where the individual detainees are from in each country so we would have had to create a by country choropleth map which would have given a false importance to larger countries like China, and been hard to show the three thematic subcategories. We could have placed the thematic symbols (1 for each detainee and color coded to their status, like in the table) on each country, but then it would have been harder to compare each country between countries for number and type of detainee as each entry would not have shared a common baseline. A table with charting accomplishes our goals: We list the countries sorted by number of detainees and grouped by continent. This serves the same function as a map would have in terms of giving in indication as to where each country is (metaphorical principle, reminding readers of the country's location in the network topology). And we get to easily compare the quantities and thematic types associated with those countries at a glance because of the common chart axis baseline.

What exactly are continents anyhow? Geology seems to have moved on to plate techtonics with 20-some major plates that often meet or rip apart the middle of "continents", but continents remain popular I think exactly because of the 5±2 rule.

Some cartographers are moving beyond the physical geography "continents" into top-level cultural regions. Allan Cartography's Raven world map does exactly this, take a look. The same holds true for any large set of thematic data. Find the trends, group them together, and use that hierarchy (topology) as an access metaphor. And remember geography doesn't always need to mean map. Your users will thank you.]

Republished from The Washington Post.
Orginally published: 16 February 2009.
Reporting by Julie Tate.

About a third of the detainees held at Guantanamo are either facing charges or approved for release. The rest are judged to be enemy combatants, and it is unclear whether they will be prosecuted, be released or continue to be held.


RELATED ARTICLE:
4 Cases Illustrate Guantanamo Quandaries
Administration Must Decide Fate of Often-Flawed Proceedings, Often-Dangerous Prisoners

Washington Post Staff Writer
Monday, February 16, 2009; Page A01

In their summary of evidence against Mohammed Sulaymon Barre, a Somali detained at Guantanamo Bay, military investigators allege that he spent several years at Osama bin Laden’s compound in Sudan. But other military documents place him in Pakistan during the same period.

One hearing at Guantanamo cited his employment for a money-transfer company with links to terrorism financing. Another file drops any mention of such links.

Barre is one of approximately 245 detainees at the military prison in Cuba whose fate the Obama administration must decide in coming months. Teams of government lawyers are sorting through complex, and often flawed, case histories as they work toward President Obama’s commitment to close the facility within a year.

Much of the government’s evidence remains classified, but documents in Barre’s case, and a handful of others, underscore the daunting legal, diplomatic, security and political challenges.

As officials try to decide who can be released and who can be charged, they face a series of murky questions: what to do when the evidence is contradictory or tainted by allegations of torture; whether to press charges in military or federal court; what to do if prisoners are deemed dangerous but there is little or no evidence against them that would stand up in court; and where to send prisoners who might be killed or tortured if they are returned home.

Answering those questions, said current and former officials, is a massive undertaking that has been hampered by a lack of cooperation among agencies and by records that are physically scattered and lacking key details.

Continue reading at Washington Post . . .