Posts Tagged ‘cities’

My WhereCampPDX keynote presentation (Kelso)

Friday, October 8th, 2010

I presented the keynote last month at WhereCampPDX, a fun, free “unconference” in Portland, Oregon focusing on all things geospatial. Lots of discussions and met great people. The PDF of my presentation can be downloaded at kelso.it/x/pdx.

I talked about “cities and the people that live them” with particular focus on how do we count people, how grouping thematic and enumeration unit size changes with map scale and has specific impact on geofencing and choosing which cities to show at different web map zoom levels. The biggest hole in GeoNames.org and other gazetteers is the 3rd world, primarily in India and China but also Africa, also where most population growth will occur the next generation.

Here are some preview slides:

screen-shot-2010-10-08-at-111135-am

screen-shot-2010-10-08-at-111154-am

screen-shot-2010-10-08-at-111212-am

screen-shot-2010-10-08-at-111224-am

screen-shot-2010-10-08-at-111251-am

screen-shot-2010-10-08-at-111312-am

screen-shot-2010-10-08-at-111326-am

Preview of Natural Earth version 1.2 populated places

Tuesday, May 4th, 2010

Version 1.1 brought Natural Earth up to ~7,000 populated places (purple hollow circle icons with labels). Version 1.2 will increase that by 25 times to about 175,000 populated places. It will be available as a supplement to the 1.1 version selection. What does this get you? A 1:1 million scale map of cities around the world and a 1:250,000 scale map of the United States and other select countries. There’s still basic selection work to be accomplished (Santiago Chile has duplicate points now, as does London) and scale ranks need refining (boosting blue 10 million, 5 million and 2 million selections from the 1:1 million black dots on these preview maps).

Because the world’s geo infrastructure sucks, not all the new features will have population counts in the 1.2 version. But most should have areal extent bounds and nesting to indicate if the town is part of a larger metro area. At the 1:250,000 scale (gmaps zoom 11), we start to see actual incorporated towns and unincorporated suburbs, but at the 1:1m scale we’re still dealing primarily in metropolitan and micropolitan features (urban areas that host multiple “cities”).

The names of the feature will also need work, but that will occur after the 1.2 release (India, China, and Central Asia mostly). The version 1.1 locations will be shifted over to use the more accurate geoNames lngLats for about 6,000 features (note Oakland below). Locations were fine at 1:10,000,000 scale but don’t always hold up on zoom in. A later update will incorporate an additional 100,000 places to flesh out the 1:1m scale and maybe a few extra for closer in. Combine these populated places with roads and they start looking like atlas plates 🙂

More preview images after the jump.

sfbayarea

haiti

iraq

More preview maps after the jump.

(more…)

Natural Earth Vector Preview: Cities (Part 2)

Thursday, November 12th, 2009

Announced at NACIS in Sacramento, California in October, we’re closing in on final release of Natural Earth vector and raster map data.

Bill Buckingham wrapped up processing the Natural Earth Vector cities (populated places point locations) this week. I’ve been honing our admin-1 and admin-1 rankings and feature names (only 4,000 states and provinces around the world, wew!).

Bill’s added population estimates for each city based on LandScan. The technique allows the user to know both the relative “regional” importance of a town, regardless of it’s population, based on which map scales the feature should be visible (thanks to Dick Furno) at AND to know how many people live there.

By taking a composite of both, you can still show small population cities that are regionally important at a small type point size along with larger populated places at the smaller map scales.

We have about 6,500 cities in Natural Earth Vector. Over 90% of those have population estimates (the ones that don’t are usually out in the boondocks). Together, our cities capture over 3 billion people or half of humanity.

For comparison, most other populated place GIS files have only 2,000 some cities and they focus on country and first order administrative capitals with a bare smattering of other towns. For instance: Lagos, Nigeria or San Francisco, California.  This makes smaller countries with lots of administrative divisions (like Slovenia, Vietnam, or Jamaica) seems way more populated than larger countries with larger administrative divisions (like the United States). See the North America screenshot below for an example and look at the Caribbean versus United States.

They also don’t estimate populations, and if they do they use official census number that hide the true “metro”-style counting of people that should inform a thematic map regardless of formal administrative boundaries at the smaller map scales that Natural Earth excels at.

Now for some screenshots:

(Scale ranks, followed by population view color coded like the scale ranks with nodata green dots, then cyan dot version is ESRI cities overlayed)

0world_ranks

0world_population

1no_amer_ranks

1no_amer_population

1no_amer_esri

2us_ranks

2us_population

2us_esri

More continents o’ dots after the jump.

(more…)

Natural Earth Vector Preview: Cities

Tuesday, August 18th, 2009

We’re closing in on having the cities for Natural Earth Vector complete. The final compilation has been made (focusing on a universal coverage based on regional importance, even if the town has less than typical population). Dick Furno has headed up this data theme and is half way thru applying 8 scale ranks to the cities. Population estimates will be added in a final step by another contributor. Screenshots show quick plots of the GIS data. Color implies ranking.

California (below):

california

Alaska-Yukon (below):

alaska-yukon

Europe Biggest Cities (below):

europeless

Europe All Cities:

europemore

Beijing-Tokyo biggest cities (below):

beijing-tokyo

Travellr: Behind the Scenes of our Region-Based Clusters (Google GeoDev)

Monday, July 6th, 2009

[Editor’s note: The age-old rule for cloropleth mapping that suggests aggregation by multi-scale areal units based on the map’s zoom level is slowly seeping into “clustering” for the point-based mashup geo community. This overview from Travellr published on the Google GeoDevelopers blog includes two illustrations that show the power of this technique. I used such a technique (different implementation) on The Washington Post’s recent swine flu mapping.]

Republished from Google GeoDevelopers Blog.
Monday, June 22, 2009

Recently, there has been a lot of interest in clustering algorithms. The client-side grid-based MarkerClusterer was released in the open source library this year, and various server-side algorithms were discussed in the Performance Tips I/O talk. We’ve invited the Travellr development team to give us insight on their unique regional clustering technique.

Travellr is a location aware answers service where people can ask travel-related questions about anywhere in the world. One of its features is a map-based interface to questions on the site using Google Maps.

dxjmnbf_4t5qkqpfw_b1
Figure 1. An example of the Travellr Map, showing question markers for Australia.

Clustering for usability
We learned that the best way to display markers without cluttering our map was to cluster our questions depending on how far you zoom in. If the user was looking at a map of the continents, we would cluster our questions into a marker for each continent. If the user zoomed-in to France we would then cluster our questions into a marker for each region or city that had questions. By clustering our data into cities, regions/states, countries, and continents, we could display relevant markers on the map depending on what zoom level the user was looking at.

Optimizing for Clustering
Our next challenge was how to extract clustered data from our database without causing excessive server load. Every time the user pans and zooms on the map, we need to query and fetch new clustered data in order to display the markers on the map. We also might have to limit the data if the user has selected a tag, as we’re only interested in a questions related to a topic (ie: “surfing”). To execute this in real-time would be painstakingly slow, as you would need to to cluster thousands of questions in thousands of locations with hundreds of tags on the fly. The answer? Pre-cluster your data of course!

Step 1. Structure your location data
When a question is asked about a city on Travellr, we also know its region/state, country and continent. We store more than 55,000 location points as a hierarchy, with each location “owning” its descendent nodes (and all of their data). Our locations are stored in a Modified Preorder Tree (also called Nested Sets). Modified Preorder Trees are a popular method of storing hierarchical data in a flat database table, having a focus on efficient data retrieval, and easy handling of sub trees. For each location we also keep a record of its depth within the tree, its location type (continent, country, region/state, or city), and its co-ordinates (retrieved using the Google Maps geocoder).

Step 2. Aggregate your data
We calculate aggregate data for every branch of our locations tree ahead of time. By storing aggregate data for cities, regions/states, countries, and continents, we provide an extremely fast and inexpensive method to query our locations database for any information regarding questions asked about a particular location. This data is updated every few minutes by a server-side task.

Our aggregations include:

  • Total question count for a location
  • Most popular tags for that location
  • Number of questions associated with each of those tags.

How we query our structured, aggregate data on the map
Whenever the user zooms or pans the map we fire off a query to our (unpublished 😉 API with the tags they are searching for, the current zoom level, and the edge co-ordinates of the map’s bounding box. Based on the zoom level (Figure 2) we work out whether we want to display markers for continents, countries, states, or cities. We then send back the data for these markers and display them on the map.

dc287ncr_29cb84v7ct_b
Figure 2. Clustering at different zoom levels (blue = continents, countries, pink = states, cities)

Everyone Wins
So what is the result of structuring and aggregating our data in such a way? It means that we have nicely organized, pre-clustered data that can be read from cheaply and easily. This allows us to provide a super-fast map interface for Travellr that puts minimal load on our infrastructure. Everyone is happy!

Comments or Questions?
We’d love to hear from you if you have any questions on how we did things, or suggestions or comments about Travellr’s map. This article was written by Travellr’s performance and scalability expert Michael Shaw (from Insight4) and our client-side scripting aficionado Jaidev Soin.

You can visit Travellr at www.travellr.com, or follow us on Twitter at twitter.com/travellr.

Measuring Informational Distance Between Cities

Tuesday, April 29th, 2008

The folks over at bestiario.org have posted a curious visualization showing the “google” distance between pairs of cities around the globe, nicely visualized in rotatable 3d. What is “google” distance? From their website:

This tridimensional scheme represents the strength of relations between cities from searches on Google. The main idea is to compare the number of pages on the internet where the two cities appear close to each other, with the number of pages they appear isolated. This proportion indicates some kind of intensity of relation between the cities. After measuring this “google proximity” we divide it by its geographical distance. By this process we obtain an indicator about the strength of relation in spite of the real distance, a kind of informational distance between cities.  

They call that measure the google-platonic distance between cities, which is explained further on their page…

Here a partial visualization (image below). Click on the image to launch their tool, see examples, and spin the globe. The visualization could be helped by showing it on a real sphere, but the linearized sphere no doubt spins quicker. Thanks Curt!

google informational distance